Presentation on theme: "DRI Grant impact at the smaller sites Pete Gronbech September 2012 GridPP29 Oxford."— Presentation transcript:
DRI Grant impact at the smaller sites Pete Gronbech September 2012 GridPP29 Oxford
2 GridPP29, Oxford Target Areas Internal Cluster networking Cluster to JANET interconnect Resilience and redundancy 26/9/12
3 GridPP29, Oxford Cluster Networking Most sites clusters have been interconnected at 1Gb/s As storage servers size increased from ~20TB to 40TB and even larger 36 bay units with usable capacities of ~70TB the network links had to be increased to cope with the number of simultaneous connections from worker nodes Many sites decided to use trunked or bonded 1Gb/s links. Work on the basis of roughly one 1Gb/s link per 10TB This no longer scales for the very large servers having 6 bonded links. Cost of gigabit networking starts to look high when you have to divide the number of ports on a switch by 6. 10G bit switch prices coming down. 26/9/12
4 GridPP29, Oxford DRI Grant Has allowed the sites to make the jump to 10Gbit switches in the cluster earlier than they would have planned to do so. Has allowed some degree of future proofing by providing enough ports to cover expected cluster expansion over the next few years. Replacing bonded gigabit with 10Gbit simplifies and tidies up the cabling and configuration. (Less to go wrong hopefully) 26/9/12
5 GridPP29, Oxford Campus connectivity Many Grid Clusters had 1 or 2G/bit connections to the campus WAN. Many sites have used grant funding to install routers to allow connectivity to the back bone at 10Gbit. If the campus backbone is made up of 10 Gbit links then the danger is that the grid cluster could saturate some of these links so blocking other traffic to the JANET connection. Links have to be doubled up on the route to the campus router. The JANET connection to the university has to be increased or the Grid link capped to allow both Grid traffic and Campus traffic to flow un hindered The alternative is to install a by-pass link directly to the JANET router. 26/9/12
6 GridPP29, Oxford Resilience Where network upgrades were able to be purchased at a cost less than anticipated some funds where used to upgrade critical service nodes or infrastructure. Storage server head nodes, caching servers, UPS or improved firewalls were items chosen by different institutes. All sites were allocated some funds to purchase monitoring nodes. Originally intend to run Gridmon but the plan changed to use PerfSonar. The end result is that the Grid clusters at the sites are in a much stronger position than before and will provide robust 26/9/12
8 GridPP29, Oxford 26/9/12 SiteCluster Networking switches Campus Switchescomments BirminghamDell Force 10 S4810 and S60 switches plus NICS Fibres provided to bypass the campus backbone to a separate 10Gbit JANET connection Main Grid cluster will have 10Gbit connection to the PP part of the shared cluster. BristolCisco switchesFibres and connections BrunelCisco Nexus 5596UP 10GE switches, and CISCO 3750E 1Gbit switches allowing 2 channel bonds to WNs 4Gb/s of 10Gbit JANET connection. CMS Site. A 72TB cache server purchased to act as a buffer between pool nodes and WNs CambridgeDell 8024F switches used to provide 10Gbps to SE head and pool nodes. Full 10Gbps connectivity to JANET via new fibres DurhamHP 5412 gigabit switches with 8 10Gbps ports. ECDFIBM BNT G8264RCISCO?
9 GridPP29, Oxford 26/9/12 SiteCluster Networking switches Campus Switchescomments GlasgowExtreme Summit X670V & X460-48t Mainly concentrated on cluster infrastructure Imperial10Gb/s infrastructure for storage, connects to 40Gb/s college connection Lancaster Force 10 Z9000 & S4810 LiverpoolForce 10 S4810 & S55, SolarFlare NICs ManchesterDell 802410Gbit campus connection OxfordForce 10 S4810Cisco 4900M Campus LAN switches 10Gbit JANET link throttled to 5Gbps
12 GridPP29, Oxford Lancaster 26/9/12 That Network Upgrade... The mad scramble network uplift plan for Lancaster took a 3-pronged approach. 1. Upgrade & Shang-hai the University's back up link. 10G (mostly) just for us. 2. Increase connectivity to campus backbone & thus between the two "halves" of the grid cluster and the local HEP cluster. 3. Add capacity for 10G networking to our cluster using a pair of Z9000 core switches & half a dozen S4810 rack switches. 4. This free's up some of the current switches that can be retasked to improve the HEP cluster networking.
16 GridPP29, Oxford RHUL 26/9/12 Now have 2x1Gb/s links to Janet, trunked. Second link added 7th March. Could not be utilised until old 1Gb/s firewall replaced. Network upgraded from 8x Dell PC6248 (1Gb/s) stack, to 2xF10 S4810 10Gb/s spine with PC6248s attached as leaves by 2x10Gb/s to each F10. Old 1Gb/s firewall out of warranty/support, to be replaced soon with Juniper SRX650 (7 Gb/s max).
18 GridPP29, Oxford Sussex 4 36-port Infiniband switches Arranged IB switches in Fat Tree topology 26/9/12
19 GridPP29, Oxford Common Themes Well planned cluster networking, balanced and future proof Vast improvement from ad hoc cost limited designs they replaced. Have brought tangible benefits.. 26/9/12
20 GridPP29, Oxford FTS Transfer Rates To Oxford 26/9/12 From Oxford
22 GridPP29, Oxford Benefits August 2012 Transfers of files to Oxford hitting 5Gbit rate cap for several hours. 26/9/12
23 GridPP29, Oxford Performance Tuning / Future Now need to concentrate on improving FTS transfers to the remaining slow sites Good Monitoring required both locally and nationally PerfSonar being installed across the sites (See next talk) Work with JANET and site networking to increase JANET connectivity where required. 26/9/12