Torrent-based Software Distribution in ALICE.

Slides:



Advertisements
Similar presentations
ALICE G RID SERVICES IP V 6 READINESS
Advertisements

Review of a research paper on Skype
Torrent base of software distribution by ALICE at RDIG V.V. Kotlyar (IHEP, Protvino), E.A. Ryabinkin (RRC-KI, Moscow), I.A. Tkachenko (RRC-KI, Moscow),
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
Wide-area cooperative storage with CFS
The Bittorrent Protocol
ALICE DATA ACCESS MODEL Outline ALICE data access model - PtP Network Workshop 2  ALICE data model  Some figures.
Day15 IP Space/Setup. IP Suite of protocols –TCP –UDP –ICMP –GRE… Gives us many benefits –Routing of packets over internet –Fragmentation/Reassembly of.
The SAM-Grid Fabric Services Gabriele Garzoglio (for the SAM-Grid team) Computing Division Fermilab.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
G RID SERVICES IP V 6 READINESS
BitTorrent Presentation by: NANO Surmi Chatterjee Nagakalyani Padakanti Sajitha Iqbal Reetu Sinha Fatemeh Marashi.
1 port BOSS on Wenjing Wu (IHEP-CC)
ALICE data access WLCG data WG revival 4 October 2013.
OSG Public Storage and iRODS
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
A P2P file distribution system ——BitTorrent Pegasus Team CMPE 208.
Grid Appliance – On the Design of Self-Organizing, Decentralized Grids David Wolinsky, Arjun Prakash, and Renato Figueiredo ACIS Lab at the University.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
Cracow Grid Workshop Grid Software Installation Tools
1 / 22 AliRoot and AliEn Build Integration and Testing System.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks C. Loomis (CNRS/LAL) M.-E. Bégin (SixSq.
N EWS OF M ON ALISA SITE MONITORING
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Site operations Outline Central services VoBox services Monitoring Storage and networking 4/8/20142ALICE-USA Review - Site Operations.
Operating Systems & Information Services CERN IT Department CH-1211 Geneva 23 Switzerland t OIS Update on Windows 7 at CERN & Remote Desktop.
EVGM081 Multi-Site Virtual Cluster: A User-Oriented, Distributed Deployment and Management Mechanism for Grid Computing Environments Takahiro Hirofuchi,
Experiment Operations: ALICE Report WLCG GDB Meeting, CERN 14th October 2009 Patricia Méndez Lorenzo, IT/GS-EIS.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
CERN IT Department t LHCb Software Distribution Roberto Santinelli CERN IT/GS.
Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Tools and techniques for managing virtual machine images Andreas.
Module 10: Windows Firewall and Caching Fundamentals.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
VO Box Issues Summary of concerns expressed following publication of Jeff’s slides Ian Bird GDB, Bologna, 12 Oct 2005 (not necessarily the opinion of)
JAliEn Java AliEn middleware A. Grigoras, C. Grigoras, M. Pedreira P Saiz, S. Schreiner ALICE Offline Week – June 2013.
Service Availability Monitor tests for ATLAS Current Status Tests in development To Do Alessandro Di Girolamo CERN IT/PSS-ED.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
EGEE is a project funded by the European Union under contract IST Package Manager Predrag Buncic JRA1 ARDA 21/10/04
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The LCG interface Stefano BAGNASCO INFN Torino.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
36 th LHCb Software Week Pere Mato/CERN.  Provide a complete, portable and easy to configure user environment for developing and running LHC data analysis.
Windows Terminal Services for Remote PVSS Access Peter Chochula ALICE DCS Workshop 21 June 2004 Colmar.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Predrag Buncic (CERN/PH-SFT) CernVM Status. CERN, 24/10/ Virtualization R&D (WP9)  The aim of WP9 is to provide a complete, portable and easy.
ALICE computing Focus on STEP09 and analysis activities ALICE computing Focus on STEP09 and analysis activities Latchezar Betev Réunion LCG-France, LAPP.
Skype.
CVMFS Alessandro De Salvo Outline  CVMFS architecture  CVMFS usage in the.
Storage discovery in AliEn
INFSO-RI Enabling Grids for E-sciencE Workshop WLCG Security for Grid Sites Louis Poncet System Engineer SA3 - OSCT.
Federating Data in the ALICE Experiment
IBM Tivoli Provisioning Manager IPv6 Enablement
WLCG IPv6 deployment strategy
Installation of the ALICE Software
ALICE & Clouds GDB Meeting 15/01/2013
Use of HLT farm and Clouds in ALICE
Dag Toppe Larsen UiB/CERN CERN,
gLite->EMI2/UMD2 transition
Torrent-based software distribution
Update on Plan for KISTI-GSDC
Torrent-based software distribution
Storage elements discovery
Simulation use cases for T2 in ALICE
WLCG Collaboration Workshop;
X in [Integration, Delivery, Deployment]
The BitTorrent Protocol
Presentation transcript:

Torrent-based Software Distribution in ALICE

Outline GDB, Annecy Torrent-based software distribution in ALICE 2 Motivation How it works Site requirements History Migration status

Motivation GDB, Annecy Torrent-based software distribution in ALICE 3 ALICE was using site shared areas for installing the pre- compiled experiment software packages Large sites suffered from AFS/NFS/… scalability issues and being a single point of failure Large space needed for the many active versions Old model needed a site local service to manage the installation, unpacking and deletion of the packages Requirement for strict site configuration to support operation – excludes use of ‘opportunistic’ resources/centres From the very beginning, the shared SW area and its access from the VO-box was considered a security risk All of the above and more are solved by the use of the Torrent protocol to distribute the software packages

Torrent terminology GDB, Annecy Torrent-based software distribution in ALICE 4 package.tar.gz Chunks of equal size package.tar.gz.torrent Clients Metadata of the original file -SHA1 of chunks -SHA1 of entire file -Tracker location Tracker Initial seeder Seeder Leech Advertise hashes of complete chunks Exchange chunks Prefer high-speed peers Get file info

How it works GDB, Annecy Torrent-based software distribution in ALICE 5 Build servers Software repository ( one tar.gz / version ) AliEn file catalogue torrent://alitorrent.cern.ch/… Torrent tracker alitorrent.cern.ch:8088 Torrent tracker alitorrent.cern.ch:8088 Torrent seeder alitorrent.cern.ch:8092 Torrent seeder alitorrent.cern.ch:8092 Site X WN 1 WN 2 WN n Site Y WN 1 WN 2 WN n No seeding between sites

How it works (2) GDB, Annecy Torrent-based software distribution in ALICE 6 Build servers for SLC5 (32b, 64b), SLC6 (32b, 64b), Mac OS X, Ubuntus … Software repository: 150GB in 600 archives  Total size of a compressed (4x factor) software ‘set’ per job is ~300MB (this is what is downloaded to the WN) One central tracker and seeder  Limited to 50MB/s to the world Fallback to other download methods if torrent download fails for any reason  wget, xrdcp  But seed them nevertheless

How it works (3) GDB, Annecy Torrent-based software distribution in ALICE 7 Bootstrap  Pilot job script fetches and installs on the local node (`pwd`) the latest AliEn build by Torrent (20MB) AliEn JobAgent gets a real job from the central queue and downloads the required software packages  Continuing to seed them in background for other local agents to quickly get them by LAN The JA will run more jobs of the same type (user and SW requirements) within the TTL of the job Everything is downloaded in the sandbox of the job, so is wiped at the end of its execution

Torrent features we use GDB, Annecy Torrent-based software distribution in ALICE 8 Clients explicitly publish their private IP in the central tracker  Allowing the discovery of LAN peers via this common service even behind NAT Local Peer Discovery  Multicast to discover peers on same network Peer exchange  Peer lists are distributed between the local peers Distributed Hash Tables  Decentralized seeder lookup – seeders are trackers

Site requirements GDB, Annecy Torrent-based software distribution in ALICE 9 How to allow this to happen  iptables rules accepting:  Outgoing to alitorrent.cern.ch TCP/8088,8092  WN-to-WN on TCP, UDP / 6881:6999 – aria2c default listening ports UDP, IGMP -> /4 – local peer discovery  Typically this is already the case, in some cases the ports had to be whitelisted (very smart firewalls )  Implicitly sites do not exchange any torrent traffic between them No service to run on the site or on the machines, no shared area any more, no SPF, essentially no local support for this

History GDB, Annecy Torrent-based software distribution in ALICE 10 The deployment has faced only policy difficulties  Eventually accepted after understanding the technology  There is no evil technology, only evil use… First tests at CERN in Site deployments starting  As the shared areas were proving insufficient  First at the large sites, in operation since 2 years Presented in various forums within the collaboration and at CHEPs Large awareness call in at ALICE T1/T2 Workshop in Karlsruhe

Migration status GDB, Annecy Torrent-based software distribution in ALICE 11 First transitions done in close collaboration with the sites  debugging on the WNs, following up the consequences on the local network, firewalls and such One month ago we have asked all sites for permission to enable torrent  Most have confirmed that the policy allows the torrent protocol and checked the firewall policies and now they run torrent  Working with the rest to solve the (mostly) non-technical issues  Some mails went to unread mailboxes …

Migration status GDB, Annecy Torrent-based software distribution in ALICE 12 T0 – in operation since 3 years T1s – 5 / 6 migrated T2s – 36 / 78 migrated Currently covering 2/3 of the resources, so on average more than 20K concurrent jobs are using torrent  Rock solid, very efficient technology  No incidents reported Aiming for full migration until next AliEn version is deployed, to completely drop the PackMan VoBox service and the need for shared SW area and caches

Conclusion GDB, Annecy Torrent-based software distribution in ALICE 13 Torrents have enabled us to  Simplify site operations by removing a VoBox service and the shared SW areas  Significantly reduce problems associated with SW deployment, relieves the sites support staff  Have quick software release cycles (both experiment and Grid middleware) The migration process was carefully staged  Policy limitation clarified – discussion with security experts  Discussions and deployment at T0/T1s and selected T2s (regional coverage)  Presently – towards complete site coverage Lifts some of the requirement for a site VoBox, specific configurations and services  Forward-looking system - towards opportunistic use of resources and clouds!