Presentation is loading. Please wait.

Presentation is loading. Please wait.

Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Department and STFC Hartree Centre STFC.

Similar presentations


Presentation on theme: "Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Department and STFC Hartree Centre STFC."— Presentation transcript:

1 Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Department and STFC Hartree Centre STFC Daresbury Laboratory mike.ashworth@stfc.ac.uk

2 STFC’s Scientific Computing Department STFC’s Hartree Centre Exploitation of Novel HPC Architectures

3 STFC’s Scientific Computing Department STFC’s Hartree Centre Exploitation of Novel HPC Architectures

4 HM Government (& HM Treasury) RCUK Executive Group Organisation

5 Joint Astronomy Centre Hawaii Isaac Newton Group of Telescopes La Palma UK Astronomy Technology Centre, Edinburgh, Scotland Polaris House Swindon, Wiltshire Chilbolton Observatory Stockbridge, Hampshire Daresbury Laboratory Daresbury Science and Innovation Campus Warrington, Cheshire Rutherford Appleton Laboratory Harwell Science and Innovation Campus Didcot, Oxfordshire STFC’s Sites

6 Understanding our Universe STFC’s Science Programme Particle Physics Large Hadron Collider (LHC), CERN - the structure and forces of nature Ground based Astronomy European Southern Observatory (ESO), Chile Very Large Telescope (VLT), Atacama Large Millimeter Array (ALMA), European Extremely Large Telescope (E-ELT), Square Kilometre Array (SKA) Space based Astronomy European Space Agency (ESA) Herschel/Planck/GAIA/James Webb Space Telescope (JWST) Bi-laterals – NASA, JAXA, etc. STFC Space Science Technology Department Nuclear Physics Facility for anti-proton and Ion research (FAIR), Germany Nuclear Skills for - medicine (Isotopes and Radiation applications), energy (Nuclear Power Plants) and environment (Nuclear Waste Disposal)

7 STFC’s Facilities Neutron Sources ISIS - pulsed neutron and muon source/ and Institute Laue-Langevin (ILL), Grenoble Providing powerful insights into key areas of energy, biomedical research, climate, environment and security. High Power Lasers Central Laser Facility - providing applications on bioscience and nanotechnology HiPER Demonstrating laser driven fusion as a future source of sustainable, clean energy Light Sources Diamond Light Source Limited (86%) - providing new breakthroughs in medicine, environmental and materials science, engineering, electronics and cultural heritage European Synchrotron Radiation Facility (ESRF), Grenoble

8 Major funded activities 190 staff supporting over 7500 users Applications development and support Compute and data facilities and services Research: over 100 publications per annum Deliver over 3500 training days per annum Systems administration, data services, high-performance computing, numerical analysis & software engineering. Major science themes and capabilities Expertise across the length and time scales from processes occurring inside atoms to environmental modelling Scientific Computing Department Director: Adrian Wander Appointed 24th July 2012

9 The UK National Supercomputing facilities The UK National Supercomputing Services are managed by EPSRC on behalf of the UK academic communities HPCx ran from 2002-2009 using IBM POWER4 and POWER5 HECToR current service 2007-2014 Located at Edinburgh, operated jointly by STFC and EPCC HECToR Phase3 90,112 cores Cray XE6 (660 Tflop/s Linpack) ARCHER is the new service, early access now, service starts 16 th Dec‘13, operated by STFC and EPCC (1.37 Pflop/s Linpack #19 TOP500)

10 10 PRACE  PRACE launched 9th June 2010  25 member countries; seat in Belgium  France, Germany, Italy and Spain have each committed €100M over 5 years  EC funding of €70M for infrastructure  Tier-0 infrastructure providing free-of-charge service for European scientific communities based on peer review  Four projects to date: PP, 1IP, 2IP, 3IP overlapping  1IP finished; 2IP extended; 3IP about half way through  EPSRC represent UK on PRACE Council; STFC and EPCC carry out technical work in the PRACE projects  STFC focus is on application optimization & benchmarking, technology evaluation, procurement procedures, training  STFC contributes 2.5% BlueGene/Q into the PRACE DECI calls

11 Scientific Highlights Journal of Materials Chemistry 16 no. 20 (May 2006) - issue devoted to HPC in materials chemistry (esp. use of HPCx); Phys. Stat. Sol.(b) 243 no. 11 (Sept 2006) - issue featuring scientific highlights of the Psi-k Network (the European network on the electronic structure of condensed matter coordinated by our Band Theory Group); Molecular Simulation 32 no. 12-13 (Oct, Nov 2006) - special issue on applications of the DL_POLY MD program written & developed by Bill Smith (the 2 nd special edition of Mol Sim on DL_POLY - the 1 st was about 5 years ago); Acta Crystallographica Section D 63 part 1 (Jan 2007) - proceedings of the CCP4 Study Weekend on protein crystallography. The Aeronautical Journal, Volume 111, Number 1117 (March 2007), UK Applied Aerodynamics Consortium, Special Edition. Proc Roy Soc A Volume 467, Number 2131 (July 2011), HPC in the Chemistry and Physics of Materials. Last 5 years metrics: –67 grants of order £13M –422 refereed papers and 275 presentations –Three senior staff have joint appointments with Universities –Seven staff have visiting professorships –Six members of staff awarded Senior Fellowships or Fellowships by Research Councils’ individual merit scheme –Five staff are Fellows of senior learned societies

12 STFC’s Scientific Computing Department STFC’s Hartree Centre Exploitation of Novel HPC Architectures

13 Opportunities Political Opportunity Business Opportunity Scientific Opportunity Technical Opportunity Demonstrate growth through economic and societal impact from investments in HPC Engage industry in HPC simulation for competitive advantage Exploit multi-core Exploit new Petascale and Exascale architectures Adapt to multi-core and hybrid architectures Build multi-scale, multi- physics coupled apps Tackle complex Grand Challenge problems

14 Tildesley Report BIS commissioned a report on the strategic vision for a UK e-Infrastructure for Science and Business. Prof Dominic Tildesley led the team including representatives from Universities, Research Councils, industry and JANET. The scope included compute, software, data, networks, training and security. Mike Ashworth, Richard Blake and John Bancroft from STFC provided input. Published in December 2011. Google the title to download from the BIS website

15 Government Investment in e-infrastructure - 2011 17 th Aug 2011: Prime Minister David Cameron confirmed £10M investment into STFC's Daresbury Laboratory. £7.5M for computing infrastructure 3 rd Oct 2011: Chancellor George Osborne announced £145M for e-infrastructure at the Conservative Party Conference 4 th Oct 2011: Science Minister David Willetts indicated £30M investment in Hartree Centre 30 th Mar 2012: John Womersley CEO STFC and Simon Pendlebury IBM signed major collaboration at the Hartree Centre Clockwise from top left

16 Intel collaboration STFC and Intel have signed an MOU to develop and test technology that will be required to power the supercomputers of tomorrow. Karl Solchenbach, Director of European Exascale Computing at Intel said "We will use STFC's leading expertise in scalable applications to address the challenges of exascale computing in a co-design approach."

17 Collaboration with Unilever 1 st Feb 2013: Also announced was a key partnership with Unilever in the development of Computer Aided Formulation (CAF) Months of laboratory bench work can be completed within minutes by a tool designed to run as an ‘App’ on a tablet or laptop which is connected remotely to the Blue Joule supercomputer at Daresbury. John Womersley, CEO STFC, and Jim Crilly, Senior Vice President, Strategic Science Group at Unilever This tool predicts the behaviour and structure of different concentrations of liquid compounds, both in the bottle and in-use, and helps researchers plan fewer and more focussed experiments. The aggregation of surfactant molecules into micelles is an important process in product formulation

18 TOP500 (#13 in Jun 2012 list) #23in the Nov 2013 list #8 in Europe #2 system in UK 6 racks 98,304 cores 6144 nodes 16 cores & 16 GB per node 1.25 Pflop/s peak 1 rack to be configured as BGAS (Blue Gene Advanced Storage) 16,384 cores Up to 1PB Flash memory Hartree Centre IBM BG/Q Blue Joule

19 Hartree Centre IBM iDataPlex Blue Wonder TOP500 (#114 in Jun 2012 list) #283in the Nov 2013 list 8192 cores, 170 Tflop/s peak node has 16 cores, 2 sockets Intel Sandy Bridge (AVX etc.) 252 nodes with 32 GB 4 nodes with 256 GB 12 nodes with X3090 GPUs 256 nodes with 128 GB ScaleMP virtualization software up to 4TB virtual shared memory

20 Hartree Centre Datastore Storage: 5.76 PB usable disk storage 15 PB tape store

21 Hartree Centre Visualization Four major facilities: Hartree Vis-1:a large visualization “wall” supporting stereo Hartree Vis-2:a large surround and immersive visualization system Hartree ISIC: a large visualization “wall” supporting stereo at ISIC Hartree Atlas:a large visualization “wall” supporting stereo in the Atlas Building at RAL, part of the Harwell Imaging Partnership (HIP) Virtalis is the hardware supplier

22 Home of the 2 nd most powerful supercomputer in the UK

23 Douglas Rayner Hartree Father of Computational Science Hartree–Fock method Appleton–Hartree equation Differential Analyser Numerical Analysis Douglas Hartree with Phyllis Nicolson at the Hartree Differential Analyser at Manchester University “It may well be that the high-speed digital computer will have as great an influence on civilization as the advent of nuclear power” 1946 Douglas Rayner Hartree PhD, FRS (1897 –1958)

24 Hartree Centre Official Opening 1 st Feb 2013: Chancellor George Osborne and Science Minister David Willetts opened the Hartree Centre and announced a further £185M of funding for e-Infrastructure £19M for the Hartree Centre for power-efficient computing technologies £11M for the UK’s participation in the Square Kilometre Array George Osborne opens the Hartree Centre, 1 st February 2013 This investment forms part of the £600 million investment for science announced by the Chancellor at the Autumn Statement 2012. “By putting out money into science we are supporting the economy of tomorrow.”

25 Work with us to: Sharpen your innovation Improve your global competitiveness Reduce research and development costs Reduce costs for certification Speed up your time to market

26 £19M investment in power-efficient technologies System with latest NVIDIA Kepler GPUs System based on Intel Xeon Phi System based on ARM processors Active storage project using IBM BGAS Dataflow architecture based on FPGAs Instrumented machine room Power-efficient Technologies ‘Shopping List’ Systems will be made available for development and evaluation projects with Hartree Centre partners from industry, government and academia

27 STFC’s Scientific Computing Department STFC’s Hartree Centre Exploitation of Novel HPC Architectures

28 Accelerators in the TOP500

29 Where are the accelerators? TOP500 1-10 accelerator based systems: #1Tianhe-233.8 Pflop/s Intel Xeon Phi3120k (2736k) cores #2Titan17.6Nvidia K20x560k (262k) #6Piz Daint 6.27Nvidia K20x116k (74k) #7Stampede5.17Intel Xeon Phi462k (366k) UK National resources: STFC Hartree Centre IBM iDataPlex has 48 Nvidia Fermi GPUs UK Regional resources: e-Infrastructure South: EMERALD consists of 372 Fermi GPUs Local resources: University and departmental clusters Under your desk?

30 The Power Wall Transistor density is still increasing Clock frequency is not due to power density constraints Cores per chip is increasing, multi-core CPUs (currently 8-16) and GPUs (~500) Little further scope for instruction level parallelism Source: Intel, Microsoft (Sutter) and Stanford (Olukotun, Hammond)

31 Processor Comparison ProcessorParallelismGB/sTflop/sGflop/s/W Nvidia K4015x642881.436.1 AMD FirePro S100002x56x32 * 4801.483.9 Intel Xeon Phi60 x 83201.004.5 Intel E5-2667W16x4500.201.3 * single precision cores. double precision is 1/4.

32 Software Challenges OpenMP for multi-core CUDA for Nvidia GPUs OpenCL OpenACC Directive-based from Cray (& PGI) OpenMP 4 for accelerators Fortran/C + MPI Fundamental challenges is extracting additional parallelism in your application

33 Gung-Ho – a new atmospheric dynamical core NEMO on GPUs DL_POLY on GPUs LBM on GPUs

34 Current Unified Model 2003 2011 Performance of the UM (dark blue) versus a basket of models measured by 3-day surface pressure errors From Nigel Wood, Met Office Met Office Unified Model ‘Unified’ in the sense of using the same code for weather forecasting and for climate research Combines dynamics on a lat/long grid with physics (radiation, clouds, precipitation, convection etc.) Also couples to other models (ocean, sea-ice, land surface, chemistry/aerosols etc.) for improved forecasting and earth system modelling “New Dynamics” Davies et al (2005) “ENDGame” to be operational in 2013

35 (17km) POWER7 Nodes Perfect scaling Limits to Scalability of the UM The problem lies with the spacing of the lat/long grid at the poles At 25km resolution, grid spacing near poles is 75m At 10km this reduces to 12m! The current version (New Dynamics) has limited scalability The latest ENDGame code improves this, but a more radical solution is required for Petascale and beyond From Nigel Wood, Met Office

36 Challenging Solutions Cube- sphere TrianglesYin-Yang GUNG-HO targets a brand new dynamical core Scalability – choose a globally uniform grid which has no poles (see below) Speed – maintain performance at high & low resolution and for high & low core counts Accuracy – need to maintain standing of the model Space weather implies a 600km deep model Five year project 2011-2015 Operational weather forecasts around 2020! From Nigel Wood, Met Office Globally Uniform Next Generation Highly Optimized “Working together harmoniously”

37 Design considerations Ford et al, “Gung Ho: A code design for weather and climate prediction on exascale machines”, EASC 2013, to appear in a special edition of the Advances in Engineering Software Fortran 2003, MPI, OpenMP and OpenAcc Other models e.g. PGAS, CAF, are not excluded Indirect addressing in the horizontal to support a wide range of possible grids Direct addressing in the vertical Vertical index innermost is optimal for cache re-use in CPUs, and can also achieve coalesced memory access in GPUs

38 Software architecture The Gung-Ho software architecture is structured into layers communicating via a defined API the driver layer (control for one or more models) the algorithm layer (high-level specification) the parallelisation system (PSy) (inter-node and intra- node parallelism, parsing, transformations, hardware-specific code generation) the kernel layer (toolkit of algorithm building blocks with directives) the infrastructure layer (generic library to support parallelisation, communications, coupling, I/O etc.)

39 Gung-Ho Single Model architecture The arrows represent the APIs connect- ing the layers The direction shows the flow control A code generator will parse the algorithm layer source code.

40 Structure PSy Generator Algorithm Generator Parser Alg Code Alg Code PSy Code Kernel Codes

41 Generator... ast,invokeInfo=parse(filename,invoke_name=invokeName) alg=algGen(ast,invokeInfo,psyName=psyName, invokeName=invokeName) psy=psyGen(invokeInfo,psyName=psyName)...

42 Invoking the generator integrate_one_generate: python../generator/generator.py -oalg integrate_one_alg.F90 -opsy integrate_one_psy.F90 integrate_one.F90 make integrate_one_generated rupert@ubuntu:~/proj/GungHoSVN/LFRIC/src/generator$ python generator.py usage: generator.py [-h] [-oalg OALG] [-opsy OPSY] filename generator.py: error: too few arguments

43 Example (integrate_one) program main... use integrate_one_module, only : integrate_one_kernel... call invoke(integrate_one_kernel(x, integral))... end program main PROGRAM main... USE psy, ONLY: invoke_integrate_one_kernel... CALL invoke_integrate_one_kernel(x, integral) … END PROGRAM main

44 Example (integrate_one) module integrate_one_module use kernel_mod implicit none private public integrate_one_kernel public integrate_one_code type, extends(kernel_type) :: integrate_one_kernel type(arg) :: meta_args(2) = (/& arg(READ, (CG(1)*CG(1))**3, FE), & arg(SUM, R, FE)/) integer :: ITERATES_OVER = CELLS contains procedure, nopass :: code => integrate_one_code end type integrate_one_kernel contains subroutine integrate_one_code(layers, p1dofm, X, R)...

45 Example (integrate_one) MODULE psy USE integrate_one_module, ONLY: integrate_one_code USE lfric IMPLICIT NONE CONTAINS SUBROUTINE invoke_integrate_one_kernel(x, integral)... SELECT TYPE ( x_space=>x%function_space ) TYPE IS ( FunctionSpace_type ) topology => x_space%topology nlayers = topology%layer_count() p1dofmap => x_space%dof_map(cells, fe) END SELECT DO column=1,topology%entity_counts(cells) CALL integrate_one_code(nLayers, p1dofmap(:,column), x%data, integral%data(1)) END DO END SUBROUTINE invoke_integrate_one_kernel END MODULE psy

46 Timetable  Further development and testing of horizontal [2013]  Testing of proposals for code architecture [2013]  Vertical discretization [2013]  3D prototype development [2014-2015]  Operational around 2020…?

47 Unified Model on Intel Xeon Phi ‘Knights Corner’ Tests using radiation code from UM on Intel Xeon Phi (KNC) – pre-production chip OpenMP not widely used in the UM Code not optimised for Xeon Phi

48 NEMO Acceleration on GPUs using OpenACC Maxim Milakov, Peter Messmer, Thomas Bradley, NVIDIA Flat profile, code converted using OpenACC directives GYRE only – we are looking at more realistic test cases, with ice and land Tesla M2090 GPUs, Westmere CPUs Milakov, Messmer, Bradley, GPU Technology Conference, 18 th -22 nd March 2013

49 DL_POLY Acceleration on GPUs Chritos Kartsaklis and Ruairi Nestor, ICHEC, Ilian Todorov and Bill Smith, STFC CUDA implem- entation of key DL_POLY features: Constraints Shake Link cell pairs Two-body forces Ewald SPME forces DMPC (dimethyl pyrocarbonate) in water, 413896 atoms (test case 4)

50 DL_POLY Acceleration on GPUs “Benchmarking and Analysis of DL_POLY 4 on GPU Clusters” Lysaght et al PRACE report Significant 8x speed-up using cuFFT MPI code scales to 10 8 atoms on >10 5 cores Pure MPI vs. 2 GPUs on the ICHEC Stokes GPU cluster Sodium Chloride, 216000 Ions (test case 2)

51 3D LBM on Kepler GPUs (1) Mark Mawson & Alistair Revell, Manchester Lattice Boltzmann Method (LBM) for solving fluid flow Focus on memory transfer issues

52 SKA STFC is a partner in the Science Data Processor (SDP) work package Stephen Pickles leads the software engineering task in SKA.TEL.SDP.ARCH.S WE developing the software system- level prototype We are also contributing to the work to build, run and support the SDP software prototypes on existing production HPC facilities (SKA.TEL.SDP.PROT.ISP). Work started on 1 st November 2013

53 New UK Government investment supports a wide range of e- Infrastructure projects (incl. data centres, networks, ARCHER, Hartree) The Hartree Centre is ‘open for business’ We are driving forward application development on emerging architectures Summary For more information see http://www.stfc.ac.uk/scd

54 If you have been … … thank you for listening Mike Ashworth mike.ashworth@stfc.ac.uk http://www.stfc.ac.uk/scd


Download ppt "Computational Science at STFC and the Exploitation of Novel HPC Architectures Mike Ashworth Scientific Computing Department and STFC Hartree Centre STFC."

Similar presentations


Ads by Google