Presentation is loading. Please wait.

Presentation is loading. Please wait.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Similar presentations


Presentation on theme: "Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008."— Presentation transcript:

1 Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008

2 Resources needed for applications arising from Nanotechnology  Large memory – Tbytes  High floating point computing speed – Tflops  High data throughput – state of the art …

3 SMP architecture P P PP Memory

4 Cluster architecture Interconnection network

5 Why not a cluster  Single SMP system easier to purchase/maintain  Ease of programming in SMP systems

6 Why a cluster  Scalability  Total available physical RAM  Reduced cost  But …

7 Having an application which exploits the parallel capabilities Studying the application or applications which will run on the cluster

8 Things to include in design Property of code Essential component CPU bound Fast computing unit Memory bound Large memory, fast access Global flow of data in parallel app Fast interconnect

9 Our choices Property of code Essential component Choice Computationn ally intensive,FP Fast computing unit 64 bit dual core,Opteron, Rev.F Large matrices Large memory, fast access 8 GB /node Finite element, spectral codes, Fast interconnect Infiniband DDR (20 Gb/s,low latency)

10 Other requirements  Space, power,cooling constraints, strength of floors  Software configuration: 1. Operating system 2. Compilers & application deve. tools 3. Load balancing and job scheduling 4. System management tools

11 Configuration PPPP PP MM M Infiniband Switch

12 Before finalizing our choice … One should check, on a similar system :  Single processor peak performance  Infiniband interconnect performance  SMP behaviour  Non commercial parallel applications behaviour

13 Parallel applications issues  Execution time  Parallel speedup Sp= T1/Tp  Scalability

14 Benchmark design  Must give a good estimate of performance of your application  Acceptance test -should match all its components

15 Comparison of performance NancoCarmelComputer 3826.4 Mflops Ratio of 7.8 !! 487 Mflops Lapack program, N=9000

16 Execution time of Monte-Carlo parallel code (MPI) Nanco Carmel1 Processes) 4389 (~1 hr) 22042 (~6hrs !) 1 1739122462 1154.848094 642.1235408 282.516

17

18 What did work  Running MPI code interactively  Running a serial job through the queue  Compiling C code with MPI

19 What did not work  Compiling F90 or C++ code with MPI  Running MPI code through the queue  Queues do not do accounting per CPU

20 Parallel performance results Theoretical peak 2.1 Tflops Nanco performance on HPL: 0.58 Tflops

21 Comparison with Sun Benchmark

22 Execution time –comparison of compilers

23

24 Performance with different optimizations

25 Conclusions from acceptance tests  New gcc (gcc4) is faster than Pathscale for some applications  MPI collective communication functions are differently implemented in various MPI versions  Disk access times are crucial - use attached storage when possible

26 Scheduling decisions  Assessing priorities between user groups  Assessing parallel efficiency of different job types (MPI,serial,OPenMP) /commercial software and designing special queues for them  Avoiding starvation by giving weight to the urgency parameter

27 Observations during production mode  Assessing user ’ s understanding of machine – support in writing scripts and efficient parallelization  Lack of visualization tools – writing of script to show current usage of cluster

28 Utilization of cluster

29 Utilization of nanco sep08

30 Nanco jobs by type

31 Conclusion  Benchmark correct design is crucial to test capabilities of proposed architecture  Acceptance tests allow to negotiate with vendors and give insights on future choices  Only after several weeks and running of the cluster at full capacity can we make informed decisions on management of the cluster


Download ppt "Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008."

Similar presentations


Ads by Google