Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hosting Large-scale e-Infrastructure Resources Mark Leese

Similar presentations

Presentation on theme: "Hosting Large-scale e-Infrastructure Resources Mark Leese"— Presentation transcript:

1 Hosting Large-scale e-Infrastructure Resources Mark Leese

2 Contents Speed dating introduction to STFC Idyllic life, pre-e-Infrastructure Sample STFC hosted e-Infrastructure projects RAL network re-design Other issues to consider

3 STFC One of seven publicly funded UK Research Councils Formed from 2007 merger of CCLRC and PPARC STFC does a lot, including… –awarding research, project & PhD grants –providing access to international science facilities through its funded membership of bodies like CERN –shares it expertise in areas such as materials and space science with academic and industrial communities …but it is mainly recognised for hosting large scale scientific facilities, inc. High Performance Computing (HPC) resources

4 Harwell Oxford Campus -STFC major shareholder in Diamond Light Source -Electron beam accelerated to near light speed within ring -Resulting light (X-Ray, UV or IR) interacts with samples being studied -ISIS -‘super-microscope’ employing neutron beams to study materials at atomic level

5 Harwell Oxford Campus -STFC’s Rutherford Appleton Lab is part of Harwell Oxford Science and Innovation Campus with UKAEA, and commercial campus management company -Co-locate hi-tech start-ups and multi-national organisations alongside established scientific and technical expertise -Similar arrangement at Daresbury in Cheshire -Both within George Osbourne Enterprise Zones: -Reduced business rates -Government support for roll out of super fast broadband

6 Previous Experiences

7 ATLAS CMS ALICELHCb 16.5 miles Large Hadron Collider LHC at CERNLHC Search for elementary but hypothetical Higgs boson particle Two proton (hadron) beams Four experiments (particle detectors) Detector electronics generate data during collisions

8 LHC and Tier-1 After initial processing, the four experiments generated 13 PetaBytes of data in 2010 (> 15m GB or 3.3m single layer DVDs) In last 12 months, Tier-1 received ≈ 6 PBs from CERN and other Tier-1s GridPP contributes equivalent of 20,000 PCsGridPP

9 Backup 10 Gbps lightpath CERN LHC OPN Optical Private Network UK Tier-1 at RAL RAL Site Site Access Router Primary Backup Front Door Firewall Security Tier-1 Janet ISP PetaBytes?!? “Normal” data UK Light Router LHC data Tier-0 & other Tier-1s Tier-1 to Tier-2s (universities) Individual Tier-1 hosts route data to routers A or UKLight as appropriate Config pushed out with Quattor Grid/cluster management tool Access Control Lists of IP address on SAR, UKLight router and/or hosts replaces firewall security As Tier-2 (universities) network capabilities increase, so must RAL’s (10  20  30 Gbps) Router A Internal Distribution

10 -LOw Frequency Array -World's largest and most sensitive radio telescope -Thousands of simple dipole antennas, 38 European arrays -1 st UK array opened at Chilbolton, Sept 2010 -7 PetaBytes a year raw data generated (> 1.5m DVDs) -Data transmitted in real-time to IBM BlueGene/P super computer at Uni of Groningen -Data processed & combined in software to produce images of the radio sky LOFAR

11 -10 Gbps Janet Lightpath -Janet  GÉANT  SURFnet -Big leap from FedEx’ing data tapes or drives -2011 RCUK e-IAG “Southampton and UCL make specific reference... quicker to courier 1TB of data on a portable drive”RCUK e-IAG -Funded by LOFAR-UKLOFAR-UK -cf. LHC: centralised not distributed processing -Expected to pioneer approach for other projects, e.g. Square Kilometre Array LOFAR

12 Sample STFC e-Infrastructure Projects

13 ICE-CSE International Centre of Excellence for Computational Science and Engineering Was going to be Hartree Centre, now DFSC STFC Daresbury Laboratory, Cheshire Partnership with IBM Mission to provide HPC resources and develop software DL previously hosted HPCx, big academic HPC before HECToR IBM BlueGene/Q supercomputer 114,688 processor cores, 1.4 Petaflops peak performance Partner IBM’s tests were first time a Petaflop application has been run in the UK (one thousand trillion calculations per second) 13 th in this year’s TOP500 worldwide listTOP500 Rest of Europe appears five times in Top 10 DiRAC and HECToR (Edinburgh) 20 th and 32 nd

14 ICE-CSE DL network upgraded to support up to 8 * 10 Gbps lightpaths to current regional Janet deliverer, Net North West, in Liverpool and Manchester Same optical fibres, different colours of light: 1.10G JANET IP service (primary) 2.10G JANET IP service (secondary) 3.10G DEISA (consortium of European supercomputers)DEISA 4.10G HECToR (Edinburgh)HECToR 5.10G ISIC (STFC-RAL) More expected as part of IBM-STFC collaboration Feasible because NNW rents its own dark (unlit) fibre network NNW ‘simply’ change the optical equipment on each end of the dark fibre Key aim is for machine and expertise to be available to commercial companies How? Over Janet? A Strategic Vision for UK e-Infrastructure estimates that 1,400 companies could make use of HPC, with 300 quite likely to do soA Strategic Vision for UK e-Infrastructure So even if some instead go for the commercial “cloud” option...

15 JASMIN & CEMS Joint Analysis System Meeting Infrastructure Needs JASMIN and CEMS funded by BIS through NERC, and UKSA and ISIC respectively Compute and storage cluster for the climate and earth system modelling community

16 Big compute and storage cluster 4.6 PetaBytes fast disc storage JASMIN will talk internally to other STFC resources compute + 500 TB 150 TB JASMIN will talk to its satellite systems 150 TB JASMIN will talk to the Nederlands, the MET Office & Edinburgh over UKLight

17 CEMS in the ISICISIC Climate and Environmental Monitoring from Space Essentially JASMIN for commercial users Promote use of ‘space’ data and technology within new market sectors Four consortia already won funding from public funded ‘Space for Growth’ competition (run by UKSA, TSB and SEEDA)UKSATSBSEEDA Hosted in International Space Innovation Centre A ‘not-for-profit’ formed by industrials, academia and government. Part of UK’s Space Innovation and Growth Strategy to grow the sector’s £turnoverSpace Innovation and Growth Strategy ISIC is STFC ‘Partner Organisation’ in terms of Janet Eligibility PolicyJanet Eligibility Policy So... Janet-BCE (Business and Community Engagement) for network access related to academic and ISIC partnersJanet-BCE Commercial ISP for network access related to commercial customers As the industrial collaboration agenda is pushed, this needs to be controlled and applicable elsewhere in STFC

18 Rtr JASMINJanet Janet & Janet-BCE traffic 10 Gbps fibre Commercial traffic BT Commercial customers VLAN Janet-BCE VLAN RAL Infrastructure Rtr ISIC Sw CEMS JASMIN and CEMS connected at 10 Gbps… …but no Janet access for CEMS via JASMIN Keeping Janet ‘permitted’ traffic as separate BCE VLAN allows tighter control Customers will access CEMS on different IP addresses depending on who they are (academia, partners, commercials) This could be enforced No CEMS traffic permitted

19 RAL Network Re-Design & Other Issues

20 Tier-1 The Outside World Two main aims: 1.Resilience: Reduce serial paths and single points of failure. 2.Scalability and flexibility: Remove need for special cases. Make adding bandwidth and adding ‘clouds’ (e.g. Tier-1 or tenants) a repeatable process with known costs. RAL PoP Site Access Router UKLight Router Internal Distribution Router A RAL Network Re-Design RAL Site CERN LHC OPN Janet “Normal” data LHC data Firewall ISIS Admin JASMIN

21 Internal Distribution Site Access & Distribution Security Site External Connectivity Backup Primary Janet CERN LHC OPN Campus Commer -cial ISP Tenants Visitors Rtr Tier-1 Rtr A Rest of RAL site Rtr Project, Facility, Dept Virtual firewall Sw 1 Sw 2Rtr 2 Rtr 1 Implicit trust relationship = bypass firewall RAL PoP: Campus Access & Distribution Internal Site Distribution

22 Rtr 1 & 2, Sw1 & 2 Front: 48 ports 1/10 GbE (SFP+) Back: 4 ports 40 GbE (QSFP+) Lots of 10 Gigs: –clouds and new providers can be readily added –bandwidth readily added to existing clouds –clouds can be dual connected

23 RAL Site Resilience 500 ft 100m Backup to London Primary to Reading

24 User Education Belief that you can plug a node or cluster into “the network” and be immediately firing lots of data all over the world is a fallacy Over provisioning is not a complete solution Having invested £m’s elsewhere, most network problems that do arise are within the last mile: campus network  individual devices  applications On the end systems... –Network Interface Card –Hard disc –TCP configuration –Poor cabling –Does your application use parallel TCP streams? –What protocols does your application use for data transfer (GridFTP, HTTP...)? Know what to do on your end systems Know what questions to ask of others

25 User Support 2010 example: CMIP5 - RAL Space sharing environmental data with Lawrence Livermore (West coast US) and DKRZ (Germany) –ESNet, California  GÉANT, London800 Mbps –ESNet, California  RAL Space30 Mbps –RAL Space  DKRZ, Germany40Mbps –So RAL is the problem right? Not necessarily... –DKRZ, Germany  RAL Space up to 700Mbps Involved six distinct parties: RAL Space, STFC Networking, Janet, DANTE, ESNet, LLNL Difficult, although the experiences probably fed into the aforementioned JASMIN Tildesley’s Strategic Vision for UK e-Infrastructure talks of “the additional effort to provide the skills and training needed for advice and guidance on matching end-systems to high-capacity networks”Strategic Vision for UK e-Infrastructure

26 I’ll do anything for a free lunch Access Control and Identity Management –During DTI’s e-Science programme access to resources was often controlled using personal X.509 certificates –Is that scalable? –Will you run or pay for a PKI? –Resource providers may want to try MoonshotMoonshot extension of eduroam technology users of e-Infrastructure resources authenticated with user credentials held by their employer Will the Janet Brokerage be applicable to HPC e-Infrastructure resources?Janet Brokerage

27 Conclusions From the STFC networking perspective: Adding bandwidth should be repeatable process with known costs Networking is now a core utility, just like electricity: plan for resilience on many levels Plan for commercial interaction In all the excitement don’t forget security e-Infrastructure funding is paying for capital investments - be aware of the recurrent costs

Download ppt "Hosting Large-scale e-Infrastructure Resources Mark Leese"

Similar presentations

Ads by Google