INFSO-RI-508833 Enabling Grids for E-sciencE FTS Administrators Tutorial for Tier-2s Paolo Badino

Slides:



Advertisements
Similar presentations
29 June 2006 GridSite Andrew McNabwww.gridsite.org VOMS and VOs Andrew McNab University of Manchester.
Advertisements

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Summary of issues and questions raised. FTS workshop for experiment integrators Summary of use  Generally positive response on current state!  Now the.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
BINP/GCF Status Report BINP LCG Site Registration Oct 2009
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The network monitoring in grid context Operations.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE middleware: gLite Data Management EGEE Tutorial 23rd APAN Meeting, Manila Jan.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
WLCG Grid Deployment Board, CERN 11 June 2008 Storage Update Flavia Donno CERN/IT.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
INFSO-RI Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
INFSO-RI Enabling Grids for E-sciencE GridICE: Grid and Fabric Monitoring Integrated for gLite-based Sites Sergio Fantinel INFN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
The new FTS – proposal FTS status. EMI INFSO-RI /05/ FTS /05/ /05/ Bugs fixed – Support an SE publishing more than.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
INFSO-RI Enabling Grids for E-sciencE SRMv2.2 in DPM Sophie Lemaitre Jean-Philippe.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
INFSO-RI Enabling Grids for E-sciencE FTS failure handling Gavin McCance Service Challenge technical meeting 21 June.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROCs Top 5 Middleware Issues Daniele Cesini,
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Data Management cluster summary David Smith JRA1 All Hands meeting, Catania, 7 March.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Grid Deployment Board 5 December 2007 GSSD Status Report Flavia Donno CERN/IT-GD.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
The GridPP DIRAC project DIRAC for non-LHC communities.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid Configuration Data or “What should be.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
EGEE-III INFSO-RI Enabling Grids for E-sciencE VO Authorization in EGEE Erwin Laure EGEE Technical Director Joint EGEE and OSG Workshop.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
INFSO-RI Enabling Grids for E-sciencE GOCDB Requirements John Gordon, STFC.
EGEE-II INFSO-RI Enabling Grids for E-sciencE WLCG File Transfer Service Sophie Lemaitre – Gavin Mccance Joint EGEE and OSG Workshop.
DGAS Distributed Grid Accounting System INFN Workshop /05/1009, Palau Giuseppe Patania Andrea Guarise 6/18/20161.
CERN LCG1 to LCG2 Transition Markus Schulz LCG Workshop March 2004.
INFSO-RI Enabling Grids for E-sciencE Padova site report Massimo Sgaravatto On behalf of the JRA1 IT-CZ Padova group.
INFSO-RI Enabling Grids for E-sciencE File Transfer Service Patricia Mendez Lorenzo CERN (IT-GD) / CNAF Tier 2 INFN - SC3 Meeting.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
CERN IT Department CH-1211 Genève 23 Switzerland t DPM status and plans David Smith CERN, IT-DM-SGT Pre-GDB, Grid Storage Services 11 November.
INFSO-RI Enabling Grids for E-sciencE Running reliable services: the LFC at CERN Sophie Lemaitre
Jean-Philippe Baud, IT-GD, CERN November 2007
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Introduction to FTS Paolo Tedesco White Area Lecture 3 October 2008
lundi 25 février 2019 FTS configuration
Site availability Dec. 19 th 2006
Presentation transcript:

INFSO-RI Enabling Grids for E-sciencE FTS Administrators Tutorial for Tier-2s Paolo Badino Gavin McCance WLCG Asian Tier-2 Workshop - Mumbai, 2 December 2006

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 2 Outline FTS Overview Tier-2 Perspective –What transfers will I see and from where? –What control do I have over my channels? –How do I set up the client software? Debugging problems –What errors/tickets will I see and how do I handle them? –What else do I need to check for my site? Reporting & Monitoring

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 3 FTS overview gLite File Transfer Service is a data movement fabric service –It is a multi-VO service, used to balance usage of site resources according to VO and site policies Why is it needed ? –For the user, the service it provides is the reliable point to point movement of Storage URLs (SURLs) –For the site manager, it provides a reliable and manageable way of serving file movement requests from their VOs –For the VO manager, it provides ability to control requests coming from his users  Re-ordering, prioritization,… –The focus is on the “service”  It should make it easy to do these things well

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 4 FTS Architecture FTS Web Services –User: FileTransfer –Administration: ChannelManagement File Transfer Queue –Oracle DB File Transfer Agents –VO Agents –Channels Agents Monitoring Tools

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 5 Management concept: channel For management ease, the service supports splitting jobs into multiple “channels” –Submitted jobs are assigned to a suitable channel for serving A channel may be: –A point to point network link  Dedicated channels  (e.g. we manage all the T0 to T1 links in LCG on a separate channel) –Various “catch-all” channels  Non-dedicated channels  (e.g. everything else coming to me, or everything to one of my tier-2 sites) –More flexible channel definitions are on the way (but not there yet) Channels are uni-directional

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 6 Management concept: channel “Channel”: It’s not a great name –Isn’t tied to a physical network path –It’s just a management concept –“Queue” might be better All file transfer jobs on the same channel are served as part of the same queue –Inter-VO priorities for the queue –Intra-VO priorities within a VO Each channel has its own set of transfer parameters –Number of concurrent transfers, number streams, TCP buffer size, etc The WLCG model assigns each FTS sever responsibilities of what transfers it is supposed to manage –Channels allow you to split up the management of the service as you see fit

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 7 Experiment use-cases What use-cases does FTS support and how do we deploy the servers to do this? Primary use-cases from experiment computing models –tier-0 export to tier-1 –tier-1 import into tier-0 –tier-1 to tier-1 data transfer –tier-2 upload of data to associated tier-1: MC upload –tier-1 push of data to associated tier-2: AOD Secondary use-cases –non-associated tier-2 to tier-1: backup MC upload –non-associated tier-1 to tier-2: AOD? –tier-2 to tier-2: ?

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 8 WLCG deployment model Deploy only at tier-0 CERN and tier-1 sites –Ease of operations – put the service where the support is –Simplifying the client job of “who do I submit to?” –Does lead to some “odd” channel definitions and introduces some restrictions on who can control what Which FTS servers are responsible for which transfers? –There are a few basic rules Described in: – –

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 9 WLCG deployment model 1.Tier-0: If it involves CERN, then CERN’s FTS does the transfer –This covers tier-0 to tier-1 and tier-1 to tier-0. 2.Tier-1 sites: if you are the destination, your FTS is responsible for running the transfer –(if you’re the source, the other end is responsible) 3.Tier-2 sites: if you are the destination, your associated tier-1’s FTS is responsible for running the transfer –Regardless of who is the source 4.We prioritise control of writing over control of reading

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 10 WLCG deployment model Primary use-cases from experiment computing models –tier-0 export to tier-1: CERN –tier-1 import into tier-0: CERN –tier-1 to tier-1 data transfer: the destination tier-1 –tier-2 upload of data to associated tier-1: the associated tier-1 –tier-1 push of data to associated tier-2: the associated tier-1 Secondary use-cases –non-associated tier-2 to tier-1: the destination tier-1 –non-associated tier-1 to tier-2: the tier-1 associated to the destination tier-2 –tier-2 to tier-2: the tier-1 associated to the destination tier-2

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 11 Where will the transfer requests come from? We prioritise control of writing over control of reading All inbound transfers are controlled by your associated tier-1 site –You can control this The majority of your outbound transfer will be upload to your associated tier-1 site –You can control this – if your T1 set up an explicit channel for you You should expect some outbound traffic transferring to other tier-1 sites or to other tier-2 sites –These will be controlled by the other tier-1 sites –We offer no easy way to shut this off because of the way the channels are defined

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 12 Channels for tier-2: inbound All inbound transfers are controlled by your associated tier-1 site –You can control this For each of it’s associated tier-2’s, a tier-1 sets up a channel STAR-TIER2 –This channel will match any source to your site Additionally, the associated tier-1 may wish to manage separately its transfers to the tier-2 –This would be an explicit TIER1-TIER2 channel  If this channel is not defined, traffic from your associated tier-1 will be matched on the general STAR-TIER2 channel

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 13 Channels for tier-2: outbound Upload to your tier-1. The tier-1 will define, either: –STAR-TIER1. To match any traffic to the tier-1. In this case, you have limited control, since switching you off as a source switches off everyone else as well –Or… an explicit TIER2-TIER1 channel. This allows you to control the reading from your site to your tier-1 Transfer to other tier-1s or tier-2s: –This will be managed by the non-associated tier-1 server. –Although “possible”, it is not practical to expect that tier-1 to manage a distinct channel for upload from all potential tier-2 sites –So in practice, you have no control of these transfers

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 14 Setting up the clients Admin CLI: –use the ChannelManagement port-type –glite-transfer-channel-* User CLI: –Use the FileTransfer port-type –glite-transfer-* Common Options: -s FTS endpoint -v Verbose -h Help

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 15 Setting up the client for admins Site admins should use BDII to find the relevant FTS server (this will be your T1 site). –Each is published with the GOCDB site-name as the key. export LCG_GFAL_INFOSYS=lcg-bdii.cern.ch:2170 export GLITE_SD_PLUGIN=bdii export GLITE_SD_SITE= where GOCSITENAME is that of your tier-1 Test (with verbose flag to tell you which endpoint it’s using): glite-transfer-channel-list –v

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 16 Setting up the client for users This should be done as part of the gLite 3.0 UI / WN install Generally it’s the same as for admins: –LCG_GFAL_INFOSYS, GLITE_SD_PLUGIN –GLITE_SD_SITE for the server the users want to submit to or –Use the –s option (accept URL, host or site name) There is also a glite-transfer-discovery tool that will return the client the endpoint directly using the source and destination SURLs or sites e.g. glite-transfer-discovery CERN-PROD TAIWAN-LCG2

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 17 Common operations The initial channel setup at the tier-1 Scheduled intervention on your SRM Unscheduled intervention on your SRM Changing the channel properties

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 18 Setup operations Know who your tier-1 FTS service administrators are Know which channels have been setup for you –STAR-TIER2, TIER1-TIER2, TIER2-TIER1 Check that the source / destination for these channels is using the correct GOC-DB name for your site glite-transfer-channel-list STAR-TIER2 Make sure that that your local admin(s) have been set as channel managers for these channels glite-transfer-channel-listmanagers STAR-TIER2 Eventually, add a local admin as channel manager e.g. glite-transfer-channel-addmanager STAR-TIER2 “/C=CH/O=CERN/OU=GRID/CN=Paolo Badino 3032”

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 19 How can I check my roles? If the channel commands don’t work: –“You are not authorised as admin for this channel” Make sure you are pointing at the correct FTS server: glite-transfer-channel-list -v Check your roles: glite-transfer-getroles

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 20 Setting up VO shares In collaboration with your tier-1, decide what VO shares there should be on each channel and set them Check the current shares: glite-transfer-channel-list STAR-TIER2 Change or add them: glite-transfer-channel-setvoshare STAR-TIER2 dteam 10

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 21 Standard share mechanism The share is decided as: Find all the VOs with pending jobs. Total their shares. e.g. –LHCb and dteam have jobs –Atlas has no job –Total = = 125 –dteam gets 25/125 = 1/5 –LHCb get 100/125 = 4/5 The share is calculated point- in-time w.r.t. current jobs. It does not use past data. LHCb100 Atlas100 dteam25

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 22 Common operations I’m fixing / upgrading / etc my SRM (scheduled intervention). How to stop all transfers: –Know in advance which of your tier-1 channels affects you (and which you are able to control) –10 to 15 minutes before the intervention:  Inform your tier-1 that you are pausing the channel(s) glite-transfer-channel-set –S Inactive After the intervention: –Inform your tier-1 that you are restarting the channel(s) glite-transfer-channel-set –S Active

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 23 Common operations My SRM has just gone down (or otherwise become unavailable). Unscheduled intervention: –The same procedure as a scheduled intervention. If the SRM is down, set the channels that involve you directly to Inactive. –Do this as soon as you can after the problem with the SRM is noticed to minimize the Failed jobs  you may find that your tier-1 site has noticed the problem first and switched you off already –Inform the tier-1 site that you are doing it. –Set the channels back Active once the problem is fixed, informing the tier-1.

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 24 Common operations A dedicated pool node is unavailable (or you need to schedule an intervention). How to stop all transfers for just one VO: –The same procedures described before, but you can turn off a VO –Instead of modify the channel status, you can set the VO share to 0. glite-transfer-channel-list (in order to get the current value of the share) glite-transfer-channel-setvoshare 0 –Inform the tier-1 site that you are doing it. –Set the share back to the previous value once the problem is fixed, informing the tier-1. glite-transfer-channel-setvoshare

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 25 Common operations Changing the transfer rate on the channel: –If you need to lower or increase the transfer rate to your site  Again, know in advance which of your tier-1 channels affects you (and which you are able to control). This is the same as those you can switch Active/Inactive –Check the current number of concurrent files: glite-transfer-channel-list –Set the number of concurrent files, as you need: glite-transfer-channel-set -f 20 –You can also change the number of streams for URLCOPY transfers glite-transfer-channel-set -T 5  Usually you need to experiment a bit to find the optimum setting

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 26 Summary FTS is a Reliable Data Movement service and a Management tool –Channel concept You can control your inbound transfers and (most) outbound transfers The majority of your transfers should be controlled by your Tier-1’s FTS server The client tools should be setup using BDII Reviewed common service operations

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 27 Links All release information and guides: – All the tutorial material is at: – FTS procedures (upgrading, moving, cleaning) – FTS FAQ – Workplan – Support list for user for administrator support (closed for community

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 28 Schedules Agreed at recent FTS workshop (Sept. 2006) Two main checkpoints: –Delegation + Schema updates  Code is mostly ready now, but not backwards compatible  Need to work on backwards compatibility on the client level For internal test end November / start December –SRM v2.2  Should be ready for test end November Will be tested with new SRM implementations on validation cluster from end November  But expect several incremental releases as we track and understand the SRM implementations

Enabling Grids for E-sciencE INFSO-RI WLCG Asian Tier-2 Workshop - FTS Tutorial for Tier-2 29 Next development priorities Development focus is service stability and making the service easier to run 1.Improve the service reporting and monitoring 2.SRM/gridFTP communication split (improves the stability) 3.More flexible channel definitions  To make it easier to meet the needs of CMS and Alice 4.Site blacklisting  To avoid clogging up shared channels with bad sites