Presentation is loading. Please wait.

Presentation is loading. Please wait.

NeSI: Providing HPC Resources for the NZ Science Community Dr. Michael J Uddstrom Director, NIWA HPCF (on behalf of the NeSI Team)

Similar presentations


Presentation on theme: "NeSI: Providing HPC Resources for the NZ Science Community Dr. Michael J Uddstrom Director, NIWA HPCF (on behalf of the NeSI Team)"— Presentation transcript:

1 NeSI: Providing HPC Resources for the NZ Science Community Dr. Michael J Uddstrom Director, NIWA HPCF (on behalf of the NeSI Team)

2 Outline NeSI – How it Happened… NeSI – Goals & Purposes; Defining HPC & Fit for Purpose; NeSI Capabilities: –HPC Resources; –Services. How to gain Research Access: –The Process; –The Allocation Classes; –The responsibilities. What we need to do – to safeguard the future… Summary; Contact details. 2

3 How did NeSI Arise? In October 2010 an Investment Case entitled: –“National eScience Infrastructure (NeSI) High Performance Computational Platforms and Services for NZ’s Research Communities” Was submitted to the Minister of Research, Science & Technology. It was prepared by a Working Group of representatives from: UoA, UC, UoO, Landcare, AgResearch & NIWA (under an indept. Chair) The Investment Case asserted that: –HPC and related eScience infrastructure are indispensable components of modern science, and are having a major impact on almost every branch of research; –By taking a sector approach, more efficient coordination and cooperation would be achieved, leading to strategically targeted investment in HPC; –Thereby providing international-scale HPC to a wide range of communities and disciplines. It was formulated following a “Needs Analysis” from

4 What “we” said about our HPC needs… In 2010 the NZ Research Community were survey to determine their existing and anticipated HPC requirements. We (~194 of us) said (e.g.): Processors to run a code< >10,000 In %39%15%5% In %32%38%17% File Space per experiment< 100GB TB100 TB>1PB In %35%0%0.6% In %58%13%3% Off-Site Data Transfers/day< 100MB1GB1TB>10TB In %42%21%4% In %28%33%23% 4

5 How is NeSI Funded? There are three anchor partners (Principal Investors): –University of Auckland (with Crown funding of ~$2.2M pa); –University of Canterbury (with Crown funding of ~$2.6M pa); –NIWA(with Crown funding of ~$1.0M pa); –And two associate members (Associate Investors): Landcare and University of Otago. The Crown’s investment is $27M over 4 years; Institutional investment (partners & associates) is $21M over 4 years –$15.4M in Capital Expenditure & $5.6M in Operational Expenditure Which provides: –4 × HPC systems at University of Auckland & Canterbury, and at NIWA –NeSI Directorate at Auckland (5 staff positions + admin) 5.5 FTEs –Systems Engineers ~ 5.5 FTEs, Site Management: 1.8 FTEs –HPC Specialist Scientific Programmers ~5.7 FTEs The “Partners” (called collaborators) control 60% of the HPC for their purpose and 40% is available for “Merit” / “Research” Access 5

6 What NeSI is Focused on? Creating an advanced, scalable computing infrastructure to support New Zealand’s research communities; –i.e. International scale as opposed to institutional/department scale. Providing the grid middleware, research tools and applications, data management, user-support, and community engagement needed for the best possible uptake (of HPC); –i.e. Enable efficient use of these systems. Encouraging a high level of coordination and cooperation within the research sector; –i.e. Fit the science to the HPC as opposed to “my institutions resources”. Contributing to high quality research outputs from the application of advanced computing and data management techniques and associated services, which support the Government’s published priorities for science. –i.e. Its all about better science to underpin national outcomes. 6

7 If its about HPC….what is an HPC? There are two “basic” types: –Capability (aka Supercomputer): provide the maximum computing power available to solve large problems: emphasis is on problem size (large memory, lots of CPUs). e.g: IBM p775/p7 & p575/p6, Cray XK6, IBM BG/Q & BG/P –Capacity: typically use efficient cost-effective computing power: the emphasis is on throughput (dealing with loads larger than a single PC/small cluster), e.g: IBM iDataPlex, HP Cluster Platform n000 The essential differences are: –the interconnect fabric performance; –the processor performance, and –reliability (i.e. resiliency to component failure). Supercomputers have high efficiency: –Efficiency = sustained-performance/peak-performance (%). 7

8 Amdahl's law states that if P is the proportion of a program that can be made parallel, and (1 − P) is the proportion that cannot be parallelized (remains serial), then the maximum speedup that can be achieved by using N processors is: In the limit, as N tends to infinity, the maximum speedup tends to 1 / (1 − P). In practice, performance to price ratio falls rapidly as N is increased once there is even a small component of (1 − P). It’s a Challenge… but this is “our” future!! e.g. if P=0.99, & N=64, thenspeedup = 39.3 (perfect = 64)  if P=0.99, & N=1024, then speedup = 91.2 (perfect = 1024)× if P=0.90, & N=1024, then speedup = 9.9 (perfect = 1024)××! e.g. if P=0.99, & N=64, thenspeedup = 39.3 (perfect = 64)  if P=0.99, & N=1024, then speedup = 91.2 (perfect = 1024)× if P=0.90, & N=1024, then speedup = 9.9 (perfect = 1024)××! The Limits to Scalability: Amdahl’s Law 8

9 What types of HPC do we need? It depends on the problem (and the data locality)! Is it “Embarrassingly Parallel” (EP)? –This means the problem can be split into independent tasks (P≈1), with each sent to a different processor: Eg: Image rendering, classification, Monte Carlo calculations, BLAST, genetic algorithms, etc… If not EP, then is it highly-scalable? –This means the problem does not place high demands on processor performance – because the coupling between processors is relatively “loose” (P>0.999) and you can use very many: Eg: materials codes, Schrodinger’s equation, DCA++, LSMS, NWChem… If not EP and not highly scalable – is it tightly coupled? –This means that the problem will place high demands on processor performance and on the interconnect between processors: (P > ~0.9) Examples: numerical weather and climate prediction, Variational data assimilation, combustion. 9

10 Domain areaCode nameInstitution# of coresPerformanceNotes MaterialsDCA++ORNL 213, PF 2008 Gordon Bell Prize Winner MaterialsWL-LSMSORNL/ETH223, PF 2009 Gordon Bell Prize Winner ChemistryNWChemPNNL/ORNL224, PF 2008 Gordon Bell Prize Finalist NanoscienceOMENDuke222,720> 1 PF 2010 Gordon Bell Prize Finalist BiomedicalMoBoGaTech196, TF 2010 Gordon Bell Prize Winner ChemistryMADNESSUT/ORNL140, TF MaterialsLS3DFLBL147, TF 2008 Gordon Bell Prize Winner SeismologySPECFEM3DUSA (multiple)149, TF 2008 Gordon Bell Prize Finalist CombustionS3DSNL147,45683 TF WeatherWRFUSA (multiple) 150,00050 TF Wednesday, July 4, 2012New Zealand HPC Applications Workshop, Wellington Applications running on Jaguar at ORNL (2011) Highly scalable  high fraction of Peak Performance Tightly Coupled Problem  small fraction of Peak Performance Just 3.6% of DCA++ Performance 10

11 NeSI HPCs in Summary University of Auckland: –IBM iDataPlex Intel processor Cluster, large node memory + some exotic hardware (i.e. GPGPUs) (Pan) General purpose HPC cluster Optimised for EP and Highly Scalable problems; University of Canterbury: –IBM BlueGene/P Supercomputer Optimised for EP and Highly Scalable problems; –IBM p755/POWER7 cluster General purpose / capability HPC cluster –IBM IDataPlex Visualisation Cluster NIWA High Performance Computing Facility: –IBM p575/POWER6 Supercomputer (FitzRoy) Optimised for tightly coupled (large) problems Operational/production-ready (i.e. IBM Support arrangements) 11

12 NeSI Auckland / Centre for eResearch Pan cluster: –IBM iDataPlex: dx360 M3 & M4 servers (912  1904 (Intel) cores) 76 nodes (Westmere: 12 cores / SMP node, 96 GB/node) (64 bit) 12 GPGPU nodes (Tesla M2090: 2 devices/node) 62 nodes (Sandy Bridge: 16 cores / SMP node, 128 GB/node) UoO nodes… InfiniBand Interconnect Linux Red Hat 6.1 Storage (General Parallel File System – GPFS) –200 TB SAN User Disk BeSTGRID Auckland cluster –Commodity hardware (approx. 500 cores) Grid submission support: –grisu, gricli, Jobs Online and other clients. 12

13 NeSI Auckland / Centre for eResearch DataFabric (Federated Data Sharing Service): –Hosted at UoA and UC –Integrated Rule Oriented Data Service (iRODS) –Tuakiri authentication –Web interface/WebDAV/FUSE/GridFTP –http://df.auckland.ac.nz/http://df.auckland.ac.nz/ Applications: –Preinstalled (some licensing may be required): Math: Gap, Magma, Matlab, Mathematica, R BioInformatics: BLAST, BEAST, beagle, PhyML, MrBayes, BEDtools, Bamtools, Bowtie, Clustal Omega, Cufflinks, FastQC, FASTX Toolkit Computational Chemistry: Gaussian, Gromacs, AMBER, Orca, VASP Engineering: Ansys, Abaqus, OpenFOAM Meteorology: WRF, WPS 13

14 NeSI Auckland / Centre for eResearch Scientific libraries –Compilers: Fortran, C & C++ (gcc, Intel & PGI) –BLAS, LAPACK/LAPACK++, ATLAS, FFTW… Support for custom built applications: –Batch submission (non-interactive, non-GUI, preferably parallel processing) –Compilers (GNU, PGI C/C++ and Fortran, Intel, –Java, Python, OpenMPI CeR NeSI Staff: –Service Delivery Manager: Marcus Gustafsson –Systems Engineers: Yuriy Halytskyy, Aaron Hicks, + 1 TBD –HPC Specialist Programmers: Markus Binsteiner, Martin Feller, Ben Roberts, Gene Soudlenkov, + 2 TBD –https://wiki.auckland.ac.nz/display/CERES/Centre+for+eResearchhttps://wiki.auckland.ac.nz/display/CERES/Centre+for+eResearch 14

15 NeSI HPCF / NIWA FitzRoy: IBM p575/p6 Supercomputer; –Being upgraded in Q4 2012; –58 (  108) POWER6 nodes; 32 cores / SMP node (64 bit): –1,856 × 4.7GHz (  3,456) cores; –34 (  66) TFLOPS peak; 602 GFLOPS / node; –5.3 (  8.5) TB Memory: 64 and 128 GB memory nodes. InfiniBand interconnect fabric; Storage: Global Parallel File System (GPFS). –790 TB SAN user disk; –5 PB Automatic Tape Library storage with Hierarchical Storage Management; AIX Operating System, 15

16 NeSI HPCF / NIWA Operating Environment: –Documentation: https://teamwork.niwa.co.nz/display/HPCFhttps://teamwork.niwa.co.nz/display/HPCF –SSH/X logon (via dedicated NeSI login node); –LoadLeveler (Batch Queue) to run non-interactive jobs; –IBM (xl) Fortran (77, 90, 95, 2003), C and C++ compilers; –IBM High Performance Computing Toolkit (MPI, OpenMP, etc;) –TotalView graphical debugger; –Third Party Software (e.g.): Make, Cmake, Python, Java, git, Subversion, GSL, Hypre, LAPACK, ParMETIS, FFTW, NetCDF (3 & 4), parallel-NetCDF, HDF5, jasper, VisIt; Access any set of specific s/w versions via MODULES. NIWA NeSI Support Staff: –Service Delivery Manager: Michael Uddstrom (0.05 FTE) –Systems Engineers: Chris Edsall, Fabrice Cantos (0.21 FTE each) –HPC Specialist Scientific Programmer: Mark Cheeseman (0.21 FTE) Expected system uptime: >99.5% 16

17 NeSI HPCF / NIWA Applications Unified Model (UK Met Office / Hadley Centre): –Weather forecasting: (global & regional – to 100 m resolution); –3DVAR & 4DVAR data assimilation; –Regional Climate modelling HadGEM3-RA; –Coupled (atmospheric, ocean, land, sea ice) earth simulation HadGEM3; –Chemistry Climate Modelling – UKCA. Ocean Modelling: –ROMS (Regional Ocean Model); –NEMO. (Global Ocean Model). Wave Modelling: –WaveWatch 3, SWAN. CFD: Gerris (self refining grid); Typical job sizes: –64 – 1024 cores & O(10) GB output per job. 17

18 NeSI BlueFern / University of Canterbury IBM BlueGene/P: –2048 nodes (4 cores / SMP node): 8192 × 0.85 GHz PowerPC 450 cores (32 bit); 4 GB memory / node; 13.6 GFLOPS / node, 23 TFLOPS (peak). –3 Dimensional Torus Interconnect. IBM p755/POWER7 Cluster (split between AIX & Linux): –13 nodes (32 cores / SMP node): 416 × 3.3 GHz POWER7 cores (64 bit); 1.7 TB Memory (128 GB / node); 769 GFLOPS/node, 10 TFLOPS (peak). –Infiniband Interconnect. Storage: General Parallel Filesystem (GPFS): –180 TB SAN user disk; –1 PB Automatic Tape Library storage with Hierarchical Storage Management. 18

19 NeSI BlueFern / University of Canterbury IBM iDataPlex Visualisation Cluster –5 nodes (8 cores / SMP node) 40 × 3.03 GHz Intel Xeon cores 2 × Tesla M2070Q GPUs / node 96 GB Memory / node Applications; –BG/P Molecular Dynamics (NAMD, LAMMPS, VASP etc), Weather Forecasting (WRF), Protein Folding/Docking (AMBER, GROMACS, etc), Monte Carlo & researcher codes –P755/POWER7 Fluid Dynamics (Fluent/CFX), Genomics (MrBayes etc), Numerical (Octave, R), interpreted languages (Java, Python+SciPy/NumPy) & researcher codes –Visualization Visualization tools (VTK, ParaView, VisIt etc.) and high-speed remote graphical sessions (vizstack, turboVNC, etc.) 19

20 NeSI BlueFern / University of Canterbury Operating Environment: –Documentation: wiki.canterbury.ac.nz/display/BlueFern/BlueFern+User+Documentation+Wiki wiki.canterbury.ac.nz/display/BlueFern/BlueFern+User+Documentation+Wiki –SSH/X logon –VNC Remote Desktop –LoadLeveler (Batch Queue) to run non-interactive jobs; –IBM (xl) Fortran (77, 90, 95, 2003), C and C++ compilers; –IBM High Performance Computing Toolkit (MPI, OpenMP, etc;) –TotalView graphical debugger; –Libraries(e.g.): LAPACK, ScaLAPACK, BLAST/MpiBlast, ESSL, FFTW, GSL –IDE: Eclipse PTP Support: –Service Delivery Manager: Dan Sun –HPC Support Consultants: François Bissey, Céline Cattoën-Gilbert, Tony Dale, Vladimir Mencl 20

21 Accessing NeSI Resources (I) NeSI is focused on providing high quality access to the most important science that needs HPC. This means that NeSI HPC Allocations are given to Research Projects that meet the following simple criteria: –The science has been peer reviewed (and approved); –There is a demonstrable need to use HPC: i.e. Researchers / PIs need to submit a Technical Proposal to NeSI explaining what they need, and why – which is assessed by a Panel of Experts. Meet these, and you will be granted a “Research Allocation” on one or more NeSI HPCs… and NeSI will subsidise 80% of the cost of the HPC core-hours that you need: –In return….your project needs to fund the remaining 20%.... –And… now that I have your attention… It is recognised that this process will take time – as it needs researchers to explicitly fund HPC in their proposals… 21

22 Accessing NeSI Resources (II) During this transition period … –If you meet the Science and Technical Requirements, but are unable to pay even the 20% then you will still be provided with an HPC Allocation (i.e. core hours) as “Research-Unfunded” – but: Your jobs will have low priority; “Research-Funded” (i.e. have paid for HPC time) will always have higher priority. So how much does NeSI HPC resources cost? PI’s projects will pay only 20% of the cost/core-h indicated above. PlatformCost per core-h (before subsidy) Minimum Allocation Unit UoA iDataPlex$ or 16 cores/node NIWA HPCF P575/P6$ cores/node UC BG/P$ cores / partition UC P755/P7$ cores/node 22

23 Accessing NeSI Resources (III) So what benefits do Research Allocation Users get? –Can seek up to 1,000,000 core hours on the UoA, NIWA and/or UC BG/P systems –Expert advice and assistance for getting their codes running on these systems (i.e there is 5.7 FTEs of specialist support available) –No charge for data storage (or for data transfers)… –Can use NeSI eResearch infrastructure (post BestGRID tools), e.g. Datafabric (iRODS) for sharing data Grisu & Gricli grid submission tools GridFTP to transfer data etc. Tuakiri authentication What if I have never used an HPC and don’t‘ know where to begin? –You can apply for a “Proposal Development” Allocation… 23

24 Accessing NeSI Resources (IV) Proposal Development Allocations are designed to: –Provide limited access to the HPCs at no cost to the PI; –Provide an opportunity to: Become familiar with the operating environments on the HPC(s) Port / implement application codes on the target HPC(s) Do preliminary runs to determine scalability and suitability of the HPC(s) for your problem (is it fit for purpose?) Enable you to complete a Technical Proposal for a Research Allocation. –Conditions: One Allocation concurrently per project per PI; Public display of project description and results required. Teaching Allocations: –Available on UoA and UC systems; –To support academic education classes or training workshops; –Require a publicly viewable class or workshop description; –No cost. 24

25 Proposal Development Allocation Applic. Does it meet the conditions for a Proposal Development Allocation? NeSI Technical Application requirements: –Contact details of PI and the team members etc. –Title –Outline the scientific goals of the potential future project –It would be useful to outline the team’s HPC experience to date –Indicate which HPC architecture is likely to be most suitable –List the codes to be used –Development tools needed (such as compilers, debuggers, profilers, etc.) –Estimate how much specialist help will be required (e.g. for software porting & installation etc.) –Indicate how much data storage will be needed. NeSI staff will be able to assist you in responding to these questions. 25

26 Research Allocation Application Does your project meet the Scientific Requirements Test? NeSI Technical Application requirements: –Contact details of PI and the team members etc. –Title (should relate to Science proposal title) –Outline the scientific goals / hypothesis and say why you need HPC –Outline the deliverables of the overall project (e.g. # papers) –Outline the team’s HPC experience & HPC architectures it has worked on in the past (incl. any Proposal Development experience) –Indicate which HPC architecture is likely to be most suitable (and why) List the codes to be used and parallelisation methods planned (if any) Indicate the scalability of the code(s) on the target architecture –Indicate all software requirements (libraries, compilers) – any licensing? –Specify the size of allocation requested (i.e. core-hours) –Indicate data storage requirements –Estimate how much specialist help will be required (e.g. for optimisation) 26

27 NeSI Status Reflection / Review (at 07/12) Creating an advanced, scalable computing infrastructure to support New Zealand’s research communities; –3 international scale HPCs operating: A grade Providing the grid middleware, research tools and applications, data management, user-support, and community engagement needed for the best possible uptake (of HPC); –NeSI is staffed (few open positions), grid middleware being developed, community engagement underway (HPC Workshop, Presentations beginning): C+ Grade Encouraging a high level of coordination and cooperation within the research sector; –Will always be a challenge – but Auckland, Canterbury & NIWA working together for the Good of NZ Science: B+ Grade Contributing to high quality research outputs… which support the Government’s published priorities for science. –To early to tell – but we need to do so within the next 12 months: No Grade yet… 27

28 Summary NeSI is a big new investment (by the Government in HPC for NZ Science) It is making a world-class HPC ecosystem available to NZ Science It is a collaboration between Universities & CRIs It is funded till June 2014 – but will need to prove its success by September 2013 To be successful… (i.e. attract ongoing funding & HPC access) –NZ Scientists will need to demonstrate their need for HPC (see User Needs Survey) –This means transitioning from the problems that I can solve on “my” PC, and/or departmental cluster – to large scale HPC provided by NeSI; This is a function of the funding-round cycle too… It will take time to learn new programming methods & tools: MPI, OpenMP, and new Operating Environments –In the presence of PBRF and Publication pressures… 28

29 Summary And: –PIs will have to demonstrate the value of NeSI by funding 20% of their access costs This has implications for the way our Institutions provide access to Operational Expenditure (most prefer to provide Capital Expenditure) –Research Projects using NeSI will need to generate excellent science (that could not be done without these HPCs) –Contribute to Government Outcomes In which case we can expect a long period of HPC funding in the years ahead. 29

30 Where to get Help NeSI Central: Access Policy:http://www.nesi.org.nz/access-policyhttp://www.nesi.org.nz/access-policy Eligibility: Allocation Classes: Application Forms: Calls Timetable: Storage: Case Studies: Sites: –CeR: –NIWA: –UC: NeSI Staff are both here to help, and willing to help! 30

31 Extra Slides 31

32 What we said we Need/will Need in HPC 32


Download ppt "NeSI: Providing HPC Resources for the NZ Science Community Dr. Michael J Uddstrom Director, NIWA HPCF (on behalf of the NeSI Team)"

Similar presentations


Ads by Google