Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to FTS Paolo Tedesco White Area Lecture 3 October 2008

Similar presentations


Presentation on theme: "Introduction to FTS Paolo Tedesco White Area Lecture 3 October 2008"— Presentation transcript:

1 Paolo Tedesco (paolo.tedesco@cern.ch)
Introduction to FTS Paolo Tedesco White Area Lecture 3 October 2008 Introduction to FTS

2 What is FTS? File Transfer Service (FTS) is a data movement service
Used by experiment frameworks High volume data streams need to be managed Balance site resources usage Prevent network overload Prevent storage overload Jobs prioritization Service monitoring and statistics The File Transfer Service (FTS) is a data movement service. FTS is used by the experiment frameworks (typically the end-users do not interact directly with it). Experiment frameworks submit jobs to the FTS; a job is a set of pairs with source and destination file names. These data streams have so high volumes that they need to be managed; FTS allows to - balance usage of site resources according to VO and site policies - prevent network overload - prevent storage overload - monitor and understand problems on the service FTS provides - to users, reliable point to point movement of files - to site managers, a manageable way of serving file movement requests from their experiments, control and monitoring - to VO managers, the ability to control requests coming from their users, re-order and prioritize them. Introduction to FTS Introduction to FTS

3 What is a channel? Single direction management queue for transfer jobs
Not tied to a physical network path An endpoint (source or destination) can be defined as: single site CERN - RAL site group (formerly known as “cloud”) CERN - [EU], where [EU] = INFN, IN2P3, PIC ... catch all CERN - * For FTS, a channel is a transfers management queue; once a job is submitted, according to the source and destination endpoints the VO of the user the job is assigned the most suitable channel according to the channel topology configured on the server. The concept of channel is not related to a physical network path. All file transfer on the same channel are served as part of the same queue; on this queue it is possible to set intra-VO shares (Atlas gets 75%, CMS gets the rest) or priorities within a VO Each channel can be configured to use a particular transfer method (gridftp or srmcopy) and has its own parameters (number of concurrent files running, number streams, TCP buffer, etc). It is also possible to set limits on concurrent transfers on the same storage element at a given time etc. Introduction to FTS Introduction to FTS

4 What is a channel? (contd)
A channel defines: Transfer protocol gridftp, srmcopy Transfer limits maximum number of concurrent transfers Transfer parameters TCP streams, buffer size timeouts VO shares “quota” of a VO as percentage share calculation mechanism (absolute, normalized, normalized on active) Transfer priorities among the same VO For FTS, a channel is a transfers management queue; once a job is submitted, according to the source and destination endpoints the VO of the user the job is assigned the most suitable channel according to the channel topology configured on the server. The concept of channel is not related to a physical network path. All file transfer on the same channel are served as part of the same queue; on this queue it is possible to set intra-VO shares (Atlas gets 75%, CMS gets the rest) or priorities within a VO Each channel can be configured to use a particular transfer method (gridftp or srmcopy) and has its own parameters (number of concurrent files running, number streams, TCP buffer, etc). It is also possible to set limits on concurrent transfers on the same storage element at a given time etc. Introduction to FTS Introduction to FTS

5 Server architecture Decoupled components Web service VO agents
Channel agents Monitoring service FTS is designed for high availability and scalability. FTS is constituted by a set of decoupled components; each of them interacts only with the central Oracle database. The web service is the only component that the user interacts with; it allows the users to submit jobs and query their status. The files and the job progress through a well defined state machine; VO and channel agents are responsible for different phases of the file/job lifecycle, according to the fact that the particular phase has VO or channel specific settings. Finally, the monitoring service collects transfer statistics. Introduction to FTS Introduction to FTS

6 Web service Stateless Load-balanced APIs scalability
software upgrade with zero user-visible downtime graceful failover if one node dies APIs Job submission / tracking Service / channel management The web-service is the most critical component: it MUST stay up at all times, otherwise users “see” downtime – their commands are rejected The web-service is stateless and is designed to be load-balanced; this allows to make it scale, to roll software upgrade with zero user-visible downtime and permits graceful failover if one node dies. Currently DNS load-balancing is used at CERN (over 3 mid-range servers) The web service offers APIs for - Job submission and tracking: Used by the applications to submit transfer job requests and poll for the status The files progress through a well-defined state machine - Service / channel management API: Used by service administrators and VO production managers to control the service Introduction to FTS Introduction to FTS

7 Agents Daemons that run a set of tasks
Each task operates on a particular state One agent per VO / channel Independent from each other Split across multiple nodes Submitted Ready Active Allocate: find a channel between the source and the destination site where the VO has a share. Fetch: select a transfer according to shares and priorities and start it. The transfer agents are daemons that run a set of independent tasks. These tasks are run periodically, and each of them operates on some part of the FTS file state machine, i.e. it looks for jobs on a particular state so that it can do something with them. We ‘bundle’ these tasks together into agent daemons; if a task should behave differently according to which VO it belongs to, it is part of a VO agent; if it should behave differently according to which channel it will be allocated, it is part of a channel agent. There is a VO agent for each VO and a channel agent for each channel. VO agents are responsible for VO-specific parts of the transfer (e.g. updating the replica catalog for a given VO or applying VO-specific retry policies in case of failed transfers). Each VO has a distinct VO agent daemon running for it. Each channel, e.g. CERN-RAL, has a distinct daemon running transfers for it. The daemon is responsible for starting and controlling transfers on the associated network link. The agents are independent from each other. Scalability is achieved by splitting them across multiple nodes as needed; typically all VO agents are put on one node and channel agents (those who do the most of the work) are put on multple nodes. The web service availability is critical because the user immediately perceives downtime; agents are in this sense less critical, also because if one is stopped the others are not affected. Of course there are limits: e.g. if all the VO agents are stopped nothing will feed new jobs to the channel agents… Introduction to FTS Introduction to FTS

8 FTM node File Transfer Monitor (FTM) node
Schema additions for FTS database Periodically queries FTS database Creates weekly/daily/hourly summaries Publishes summaries into gridview Currently publishing data rates for site to site transfers Pluggable architecture Incremental approach Easy to extend Add external contributions A new node type, FTM, has been created. The File Transfer Monitor node needs a schema addition to the FTS database, where it stores results of queries to the FTS database. Introduction to FTS Introduction to FTS

9 Security Transfers are run using the clients’ X509 credentials
delegated by the client to the service (impersonation) Full audit on all administrative operations i.e. modifying channel parameters, VO shares, adding and removing VO or channel managers all operations recorded in the logs, not (yet) in the DB VOMS credentials (attribute certificates) used (and renewed as necessary) in FTS >= 2.0 DB: easier to access; provide a web interface. Introduction to FTS Introduction to FTS

10 Security: Roles Roles VO production manager Channel administrator Service manager Machine certificate of web service is by default a service manager Get a list of your roles with glite-transfer-getroles ~ $ glite-transfer-getroles Your current clientDN is: /DC=ch/DC=cern/.../CN=Paolo Tedesco You are authorised to submit to this service because your cert contains the following principal: /DC=ch/DC=cern/.../CN=Paolo Tedesco You are VO manager for 1 VOs. You are VO manager for VO <dteam> because your cert contains the following principal: /DC=ch/DC=cern/.../CN=Paolo Tedesco You are channel manager for 1 channels. You are channel manager for channel <CERN-CERN> because your cert contains the following principal: /DC=ch/DC=cern/.../CN=Paolo Tedesco Introduction to FTS Introduction to FTS

11 FTS and SRMs FTS uses the SRM interface to set up/run/finalize the transfer Actual copy 3rd party gridFTP SrmCopy FTS server is typically located on either the source of destination site Administrative convenience Not a requirement SRM on site A SRM on site B Data flow Introduction to FTS Introduction to FTS

12 Information system File as plugin Cache BDII information
/opt/glite/etc/services.xml Cache BDII information mostly ‘static’ avoid heavy queries to information system add resilience to glitches Updated by the sd2cache package resilience: if a site goes down for maintenance, its record can disappear, which is undesirable for the FTS. The sd2cache tool creates a sticky cache - new and existing entries are overwritten in the file from the information in BDII, old entries are kept in the file. Introduction to FTS Introduction to FTS

13 services.xml: SRM endpoint
<service name="httpg://lxdpm101.cern.ch:8446/srm/managerv2"> <parameters> <endpoint>httpg://lxdpm101.cern.ch:8446/srm/managerv2</endpoint> <type>SRM</type> <version>2.2.0</version> <site>CERN-PROD</site> <wsdl>unset</wsdl> <volist> <vo>atlas</vo> <vo>cms</vo> <vo>dteam</vo> </volist> <param name="atlas:SEMountPoint">/dpm/cern.ch/home/atlas</param> <param name="cms:SEMountPoint">/dpm/cern.ch/home/cms</param> <param name="dteam:SEMountPoint">/dpm/cern.ch/home/dteam</param> </parameters> </service> Check if the version is published correctly (No SRM Method factory…) This is the site used for channel resolution (No channel found…) Check that the entry for the service is defined and contains your vo in the volist (No site for host…) Introduction to FTS Introduction to FTS

14 Configuration Agents Configuration files
under /opt/glite/etc/glite-data-transfer-agents.d agent type: channel / vo transfer type: urlcopy / srmcopy (channel agents) channel or VO name properties and log-properties /opt/glite/etc/glite-data-transfer-agents.d $ ls glite-transfer-channel-agent-srmcopy-CERN-FNAL.log-properties glite-transfer-channel-agent-srmcopy-CERN-FNAL.properties.xml glite-transfer-channel-agent-urlcopy-CERN-NDGF.log-properties glite-transfer-channel-agent-urlcopy-CERN-NDGF.properties.xml glite-transfer-vo-agent-DTEAM.log-properties glite-transfer-vo-agent-DTEAM.properties.xml Introduction to FTS Introduction to FTS

15 .properties files xml files loaded by agents at startup
database connection parameters channel and VO settings tipically configured via YAIM check installation and configuration guides many parameters moved to database Introduction to FTS Introduction to FTS

16 A quick CLI tour Web service interfaces: ChannelManagement
glite-transfer-channel-add, glite-transfer-channel-setvolimit... FileTransfer glite-transfer-submit, glite-transfer-list... gridsite delegation glite-delegation-init, glite-delegation-destroy... delegation interface is probably much less used; I’m listing it here as it might be helpful to reproduce bugs or perform some tests Introduction to FTS Introduction to FTS

17 CLI: (useful) common options
Service location (-s) Service details (-v) Verbose flag (--verbose) ~ $ glite-transfer-submit -s .../FileTransfer -v --verbose ... # Using endpoint # Service version: 3.3.2 # Interface version: 3.3.0 # Schema version: 3.1.0 # Service features: glite-data-transfer-fts # Client version: 3.4.2 # Client interface version: 3.3.0 Server supports delegation. Delegation will be used by default. Remaining time for local proxy is 11 hours and 59 minutes. Not bothering to do delegation, since the server already has a delegated credential for this user lasting longer than 4 hours. Remaining time on server for this credential is 11 hours and 57 minutes. 7f3bbf5f-896f-11dd-824c-e41662be838f service details verbose output Always better to specify the server location to avoid surprises. If the service location is not specified, the service found in services.xml is used (if you are on a server machine might be ok), otherwise some smart behavior might be attempted (use at own risk). For testing, debugging etc always print service details, to make sure that you’re using the service you actually want to use. job id Introduction to FTS Introduction to FTS

18 file details (-l option)
Job submission create a proxy ~ $ voms-proxy-init ... ~ $ glite-transfer-submit srm://srm-dteam.cern.ch:8443/srm/managerv2?SFN=/foo srm://srm-dteam.cern.ch:8443/srm/managerv2?SFN=/bar 7f3bbf5f-896f-11dd-824c-e41662be838f ~ $ glite-transfer-status -l 7f3bbf5f-896f-11dd-824c-e41662be838f Failed Source: srm://...?SFN=/foo Destination: srm://...?SFN=/bar State: Failed Retries: 1 Reason: SOURCE error during PREPARATION phase: [GENERAL_FAILURE] blah blah... Duration: 0 submit job job state poll status file details (-l option) Introduction to FTS Introduction to FTS

19 Jobs listing restrict on own DN restrict on VO
~ $ glite-transfer-submit ... 212d1116-8a19-11dd-ad25-efb5df4948c3 ~ $ glite-transfer-list -u "$MY_DN" 212d1116-8a19-11dd-ad25-efb5df4948c3 Ready ~ $ glite-transfer-list -o dteam 009efc11-94db-11dc-af46-d1a42b7e2b52 Ready 00dea dc-af46-d1a42b7e2b52 Ready 0161bdbf-94bb-11dc-bfac-c d9 Ready c dc-bfac-c d9 Ready 03ad21ee dc-b96e-86619d08b Ready 042d77f9-973d-11dc-bfac-c d9 Ready 0573c634-94bf-11dc-bfac-c d9 Ready 0a789eea-8de0-11dc-bfac-c d9 Active 0b1ddf3d-8de0-11dc-b96e-86619d08b Active 0f83bea dc-af46-d1a42b7e2b52 Ready restrict on own DN restrict on VO Technical note: on current version listing VO transfers is allowed only for VO admins. There’s an open bug on it and will be changed with one of the next releases. Introduction to FTS Introduction to FTS

20 Channel Administration
~ $ glite-transfer-channel-list -s .../ChannelManagement CERN-CERN CERN-ASGC ... ~ $ glite-transfer-channel-list -x CERN-CERN Channel: CERN-CERN Between: CERN-PROD and CERN-PROD State: Active Contact: Bandwidth: 0 Nominal throughput: 10 Number of files: 20, streams: 1 TCP buffer size: default Message: Restarted for Castor intervention Last modification by: /DC=ch/DC=cern/OU=computers/CN=fts114.cern.ch Last modification time: :10:12 Number of VO shares: 6 VO 'alice' share is: 100 and is limited to 5 transfers VO 'atlas' share is: 100 and is limited to 20 transfers detailed listing source and dest sites max num of transfers VO shares and limits Introduction to FTS Introduction to FTS

21 Channel Administration 2
~ $ glite-transfer-channel-add -f 20 -T 1 -S Active CERN-CERN CERN-PROD CERN-PROD ~ $ glite-transfer-channel-setvoshare CERN-CERN alice 100 ~ $ glite-transfer-channel-setvolimit CERN-CERN alice 5 ~ $ glite-transfer-channel-listmanagers CERN-CERN /DC=ch/DC=cern/OU=Organic Units/.../CN=Paolo Badino /DC=ch/DC=cern/OU=Organic Units/.../CN=Gavin Mccance ~ $ glite-transfer-channel-addmanager CERN-CERN /DC=ch/DC=cern/OU=Organic Units/.../CN=Paolo Tedesco create channel set shares and limits list and add managers Introduction to FTS Introduction to FTS

22 FileTransfer interface
VO Administration ~ $ glite-transfer-listvomanagers -s .../FileTransfer dteam /DC=ch/DC=cern/OU=Organic Units/.../CN=Paolo Badino /DC=ch/DC=cern/OU=Organic Units/.../CN=Gavin Mccance ~ $ glite-transfer-addvomanager dteam /DC=ch/DC=cern/OU=Organic Units/.../CN=Paolo Tedesco ~ $ glite-transfer-setpriority 7f3bbf5f-896f-11dd-824c-e41662be838f 5 FileTransfer interface requires VO admin role Introduction to FTS Introduction to FTS

23 Questions? fts-support@cern.ch ftscern.support@cern.ch
Configuration issues, software problems... Operational issues related to CERN FTS services. Open list. Announcements are also sent to this list. Introduction to FTS Introduction to FTS


Download ppt "Introduction to FTS Paolo Tedesco White Area Lecture 3 October 2008"

Similar presentations


Ads by Google