Presentation is loading. Please wait.

Presentation is loading. Please wait.

Torrent-based Software Distribution in ALICE.

Similar presentations


Presentation on theme: "Torrent-based Software Distribution in ALICE."— Presentation transcript:

1 Costin.Grigoras@cern.ch Torrent-based Software Distribution in ALICE

2 Outline GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 2 Motivation How it works Site requirements History Migration status

3 Motivation GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 3 ALICE was using site shared areas for installing the pre- compiled experiment software packages Large sites suffered from AFS/NFS/… scalability issues and being a single point of failure Large space needed for the many active versions Old model needed a site local service to manage the installation, unpacking and deletion of the packages Requirement for strict site configuration to support operation – excludes use of ‘opportunistic’ resources/centres From the very beginning, the shared SW area and its access from the VO-box was considered a security risk All of the above and more are solved by the use of the Torrent protocol to distribute the software packages

4 Torrent terminology GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 4 package.tar.gz Chunks of equal size package.tar.gz.torrent Clients Metadata of the original file -SHA1 of chunks -SHA1 of entire file -Tracker location Tracker Initial seeder Seeder Leech Advertise hashes of complete chunks Exchange chunks Prefer high-speed peers Get file info

5 How it works GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 5 Build servers Software repository ( one tar.gz / version ) AliEn file catalogue torrent://alitorrent.cern.ch/… Torrent tracker alitorrent.cern.ch:8088 Torrent tracker alitorrent.cern.ch:8088 Torrent seeder alitorrent.cern.ch:8092 Torrent seeder alitorrent.cern.ch:8092 Site X WN 1 WN 2 WN n Site Y WN 1 WN 2 WN n No seeding between sites

6 How it works (2) GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 6 Build servers for SLC5 (32b, 64b), SLC6 (32b, 64b), Mac OS X, Ubuntus … Software repository: 150GB in 600 archives  Total size of a compressed (4x factor) software ‘set’ per job is ~300MB (this is what is downloaded to the WN) One central tracker and seeder  Limited to 50MB/s to the world Fallback to other download methods if torrent download fails for any reason  wget, xrdcp  But seed them nevertheless

7 How it works (3) GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 7 Bootstrap  Pilot job script fetches and installs on the local node (`pwd`) the latest AliEn build by Torrent (20MB) AliEn JobAgent gets a real job from the central queue and downloads the required software packages  Continuing to seed them in background for other local agents to quickly get them by LAN The JA will run more jobs of the same type (user and SW requirements) within the TTL of the job Everything is downloaded in the sandbox of the job, so is wiped at the end of its execution

8 Torrent features we use GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 8 Clients explicitly publish their private IP in the central tracker  Allowing the discovery of LAN peers via this common service even behind NAT Local Peer Discovery  Multicast to discover peers on same network Peer exchange  Peer lists are distributed between the local peers Distributed Hash Tables  Decentralized seeder lookup – seeders are trackers

9 Site requirements GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 9 How to allow this to happen  iptables rules accepting:  Outgoing to alitorrent.cern.ch TCP/8088,8092  WN-to-WN on TCP, UDP / 6881:6999 – aria2c default listening ports UDP, IGMP -> 224.0.0.0/4 – local peer discovery  Typically this is already the case, in some cases the ports had to be whitelisted (very smart firewalls )  Implicitly sites do not exchange any torrent traffic between them No service to run on the site or on the machines, no shared area any more, no SPF, essentially no local support for this

10 History GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 10 The deployment has faced only policy difficulties  Eventually accepted after understanding the technology  There is no evil technology, only evil use… First tests at CERN in 02.2009 Site deployments starting 06.2009  As the shared areas were proving insufficient  First at the large sites, in operation since 2 years Presented in various forums within the collaboration and at CHEPs Large awareness call in 01.2012 at ALICE T1/T2 Workshop in Karlsruhe

11 Migration status GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 11 First transitions done in close collaboration with the sites  debugging on the WNs, following up the consequences on the local network, firewalls and such One month ago we have asked all sites for permission to enable torrent  Most have confirmed that the policy allows the torrent protocol and checked the firewall policies and now they run torrent  Working with the rest to solve the (mostly) non-technical issues  Some mails went to unread mailboxes …

12 Migration status GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 12 T0 – in operation since 3 years T1s – 5 / 6 migrated T2s – 36 / 78 migrated Currently covering 2/3 of the resources, so on average more than 20K concurrent jobs are using torrent  Rock solid, very efficient technology  No incidents reported Aiming for full migration until next AliEn version is deployed, to completely drop the PackMan VoBox service and the need for shared SW area and caches

13 Conclusion GDB, Annecy 10.10.2012 Torrent-based software distribution in ALICE 13 Torrents have enabled us to  Simplify site operations by removing a VoBox service and the shared SW areas  Significantly reduce problems associated with SW deployment, relieves the sites support staff  Have quick software release cycles (both experiment and Grid middleware) The migration process was carefully staged  Policy limitation clarified – discussion with security experts  Discussions and deployment at T0/T1s and selected T2s (regional coverage)  Presently – towards complete site coverage Lifts some of the requirement for a site VoBox, specific configurations and services  Forward-looking system - towards opportunistic use of resources and clouds!


Download ppt "Torrent-based Software Distribution in ALICE."

Similar presentations


Ads by Google