Presentation on theme: "17 th June 2008 HPC Systems available to QUB Researchers Ricky Rankin, Vaughan Purnell,"— Presentation transcript:
17 th June 2008 HPC Systems available to QUB Researchers Ricky Rankin, firstname.lastname@example.org email@example.com Vaughan Purnell, firstname.lastname@example.org
17 th June 2008 Agenda HPC at Queen’s University Application Areas Observations Future
17 th June 2008 HPC at Queen’s University 1990 Parallel Computer Centre –JISC Initiative 1996 Centre for Supercomputing in Ireland –Partnership with TCD –IMB SP2 Research Support Computing Group within Information Services –SRIF funding
17 th June 2008 Research Support Computing Group System support and administration Development of applications Support of code and applications Training Providing documentation and examples Consultation
17 th June 2008 High Performance Computing Systems Computation –2 Unix Clusters –SGI Altix –Windows Compute Cluster Matlab and other windows applications Visualisation Systems HP & SGI – currently based in NITC Database Server
17 th June 2008 HPC Unix Clusters Harvey – 2-CPU Itanium2 server nodes –Gigabyte interconnect –HP UX XC 10 node is an rx8600 Quadrics interconnect Both Clusters are connected to 10 Terabyte Storage Area Network (SAN).
17 th June 2008 Altix 350 16 Itanium 2 1.4 Ghz 3MB L3 Cache 32GB main memory NUMAlink4 system interconnect 160GB system disks Data Grid connectivity Infiniband connectivity FPGA board
17 th June 2008 Windows Compute Cluster Head node + 8 worker nodes (total of 32 cores) Mix of HP ProLiant –DL140-G2 Xeon 3.6GHz, 4GB memory –DL140-G3 Xeon 5160 3GHz ("Woodcrest"), 8GB memory Head node connected by fibre to 10TB SAN
17 th June 2008 Windows Compute Cluster HP BladeSystem c-Class c7000 (total of 128 cores) –Four HP BL460c dual Quad-core E5420 blades with 16 GB memory –Twelve HP BL260c dual Quad-core E5420 blades with 10 GB memory
17 th June 2008 Visulisation Systems Silicon Graphics Prism 16 x Intel Itanium 2, 1.4GHz 3MB L3 cache 32GB main memory 4 x Graphics Pipes Gigabit Ethernet connectivity HP Visualization Centre 5 x HP xw8200 Xeon 64 workstations (running windows XP)
17 th June 2008 Database Server Mix of HP ProLiant –DL140-G2 Xeon 3.6GHz, 4GB memory
17 th June 2008 Software Anything that has a valid license –Compute Clusters Beast R –Windows Compute Cluster Paup POWSIM Matlab ARCGIS ?? –Database Server T1D
17 th June 2008 High Performance Computing System rx8600 rx2600 Visualisation System Q u a d ri c s S w it c h SANSAN Windows System
17 th June 2008 HPC Training Internal - start of first semester Introduction to UNIX ‘C’ Programming for HPC Systems Message Passing Interface Programming OpenMP Programming Matlab
17 th June 2008 Support Help with the system,e.g., account and job submission problems. Help with system software, e.g., installation, updates and usage. Help with developing applications and porting of codes. Advice on starting up new HPC projects.
17 th June 2008 Application Areas – Evolutionary Biology Phylogeny - is the study of evolutionary relatedness among various groups of organisms. evolutionaryorganisms QUB work focused on taxonomy and ecology of Antarctic and deep sea incirrate octopuses.
17 th June 2008 Application Areas – Evolutionary Biology PAUP is a tool for inferring and analysing phylogenetic trees. This is a heuristic search approach involving subtree pruning regrafting operations. Running PAUP over many replicates is very processor intensive – tying desktop up for days.
17 th June 2008 Application Areas – Evolutionary Biology HPC solution Installed PAUP on the Windows Cluster. Used 1000 replicates packaged up in 15 lots of 63 reps and 1 lot of 55 reps giving 16 input (.nex) files. Job creation and submission achieved via the Windows job manager. The 16 tasks took approx. 6 days to complete running on 16 cores. BENEFIT:- able to process larger number of replicates in parallel, freeing of desktop and results returned quickly.
17 th June 2008 Application Areas – Evolutionary Biology Other work: Other researchers now looking to use PAUP. Similar application called POWSIM currently being trailed. POWSIM is a program for assessing statistical power when testing for genetic differentiation. This is being used to study plant population genetics and evolution.
17 th June 2008 Evolutionary Biology user comments I just couldn't have bootstrapped this dataset without use of the cluster and whilst I might have been able to publish it (on the grounds that the computing power just wasn't available) - it wouldn't get in such a good journal - and we also wouldn't know how good the results were - which is a bit of a limitation in science when you're trying to determine the truth. It's a bit of extra work - but hey, the cluster has just saved me 15 x 6 days so I think I can cope! What I've actually achieved on this particular run is that I've sorted out a long standing taxonomic problem in deep-sea octopuses in the genus Graneledone and established that there really are two valid species in the North pacific and they're separated by depth - and also that they probably evolved from the Atlantic species Graneledone verrucosa. Not totally irrelevant because this dating of nodes is what we were doing using BEAST on one of your other clusters. I no longer use the drop down menus - I write scripts ... anyway Vaughan has made me feel very comfortable so part of the success of this is definitely staff dependent! So, all in all, it has been a very positive experience for me - and I have been singing its praises around the department so I'm sure you will get future users - well, you definitely will, because I have a PhD student starting in October who will be using it.
17 th June 2008 Application Areas - Medicine High Resolution Cellular Imaging for Cancer Diagnosis This project faces several computational challenges including the processing of thousands of hi-resolution (30GB) microscopy images. A key requirement is rapid, high throughput analysis of Tissue microarrays (TMAs).
17 th June 2008 Application Areas - Medicine Framework written in C++ and using MPI Master/slave method of processing image tiles from the hi-res image. Each ‘tile’ is applied an analysis algorithm developed in ‘C’. Job submission was written in C#. Tests have showed significant speedup: CoresTime 2162.908 465.3422 327.12
17 th June 2008 Application Areas - Geography Rivers Aim is to predict flooding and search for a link to self-organised criticality (study of complexity in nature). Examination of power-law frequency magnitude distributions that best fit the datasets.
17 th June 2008 Application Areas - Geography Processing 40 rivers being analysed Each river is sampled in 3 places. Each sample is evaluated by (Monte Carlo) MATLAB statistical functions plvar and plpva. Each plvar function takes about 5.5 hrs to run Each plpva function takes about 7 hrs to run. Impractical to run on desktop!
17 th June 2008 Application Areas - Geography HPC Solution MATLAB Distributed Computing Toolbox Parallelized plvar and plpva function so that the iterations are distributed to a number of processors on the Windows Cluster. Initial timings show that the plvar function, normally taking 5.5 HRS on a standard desktop, reduced to 21 minutes using a matlab pool of 16 processors.
17 th June 2008 Geography user comments Before using the Windows Cluster I was faced with a major dilemma. The runs I needed to complete would each take over 5 hours to do and I was faced with the possibly of repeating this process countless times. I discussed the possibly of reducing my investigation or carrying on with methods that were faster but probably inaccurate, a decision I found very hard to make. Only by making use of the Windows cluster have I been able to successfully complete this integral part of my thesis
17 th June 2008 Observations Notice no Physicists or Applied Mathematicians => new users are not programmers. Many users familiar with the Windows XP desktop. Many users are tying up their desktops and restricting the amount of processing. The challenge is to identify these users and give them some handholding to get them started. Expecting to see more MATLAB Distributed Computing being done.
17 th June 2008 FutureHardware Anything that there is a valid license Compute Clusters Beast R Windows Compute Cluster Paup POWSIM Matlab ARCGIS ?? Database Server T1D
17 th June 2008 Future Replace Harvey Expand XC Expand Windows Compute Cluster