Presentation is loading. Please wait.

Presentation is loading. Please wait.

ELASTIC LSF EXTENSION AT CNAF. credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali.

Similar presentations


Presentation on theme: "ELASTIC LSF EXTENSION AT CNAF. credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali."— Presentation transcript:

1 ELASTIC LSF EXTENSION AT CNAF

2 credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali

3 idea Try and extend CNAF’s LSF to opportunistic resources / commercial resources Opportunistic: Tier3 Bologna, with Openstack resources Outside CNAF’s FW Commercial: Tests with Aruba (second italian cloud provider) I speak just about the second

4 Aruba ~20 MW installed in Italy O(50) MW installed in France, Germany, UK, CZ Not in the Computing Crunching business, mostly serving as DB/Web/Etc provider for Big Clients Centers are big as CPU installment, quite modest as storage At or below a CMS Tier1 Using VMWare vSphere are virtualization engine We are playing with the center in Arezzo (~150 km from CNAF), which is O(6 MW), and is connected 4x20 Gbit/s links to commercial network providers

5 Key points We do not have to worry about job’s lifetime vSphere can decrease the “virtual clock” of a machine to O(100 MHz) No jobs die, they become very slow; no sockets get closed RAM and (local VM) disk are not problematic, the host machines are very rich At the moment, no “fee” for networking I can only guess up to the point we disturb the real activities CPU usage by “real customers” is O(10%); we are trying to use the rest and accept to be clocked down when the real customer needs CPU cycles.

6 Elastic Expansion of CNAF Each machine gets a private IP into Aruba’s network We set up a “tun” kernel tunnel to CNAF’s CEs, and ARGUS At CNAF, there is the receiver of this tunnel, which also assigns tunnel IPs to the Aruba machines We host in Aruba: A squid for Frontier/CVMFS caching A GPFS AFM client It is a special GPFS client which “accepts” high latency connections and caches the results (only RO) This is used to serve the LSF environment Each WN uses the same image used at CNAF (extracted from quattor), with a special config which points to the services above CVMFS configs Special SITECONF for CMS

7 Aruba CNAF Rest of the world CE LSF master ARGUS SQUID, AFM Conditions, CVMFS Data via Xrootd, StageOut via SRM Pilot/glideInWMS/Cond or/cmsRun Tunnel

8 So … It works, finished testing all the pieces, now we would like to ramp it For the moment, all “free” Still to understand/discuss the economic model In the end what simplifies a lot the picture is No need to kill / restart machines. Slowing down for seconds seem to be quite acceptable for our jobs We are not paying for network (that is their model for all the customers, not just in this phase) Still, it was unclear which fraction of the 80 Gbit/s we could try and use If all the rest makes sense, we could bring there a private GARR link (just speculations at the moment) They offered the possibility to host machines at their site ($=?) We are thinking to send there a O(200 TB) xrootd caching proxy if the other tests are ok Speaking about the AWS test, what we did seems well replicable on AWS (indeed, already working on OS) We would like to have the chance to test it also there


Download ppt "ELASTIC LSF EXTENSION AT CNAF. credits Vincenzo Ciaschini Stefano Dal Pra Andrea Chierici Vladimir Sapunenko Tommaso Boccali."

Similar presentations


Ads by Google