Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001.

Similar presentations


Presentation on theme: "1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001."— Presentation transcript:

1 1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001

2 2 High Performance Computing for CM 1)Overview 2)Alternative Architectures 3)Message Passing 4)“Shared Memory” 5)Case Study Agenda:

3 3 1)High Performance Computing - Overview

4 4 * Understanding HPC concepts * Why should programmers care about the architecture? * Do compilers make the right choices? * Nowadays, there are alternatives Some Important Points

5 5 Trends in computer development *Speed of calculation is steadily increasing *Memory may not be in balance with high calculation speeds *Workstations are approaching speeds of especially efficient designs *Are we approaching the limit of the speed of light? * To get an answer faster, we must perform calculations in parallel

6 6 Some HPC concepts * HPC * HPF / Fortran90 * cc-NUMA * Compiler directives * OpenMP * Message passing * PVM/MPI * Beowulf

7 7

8 8

9 9

10 10

11 11

12 12

13 13

14 14 2) Alternative Architectures

15 15 Source: IDC, 2001

16 16 Source: IDC, 2001

17 17

18 18 IUCC (Machba) computers Cray J90 -- 32 cpu Memory - 4 GB (500 MW) Origin2000 112 cpu (R12000, 400 MHz) 28.7 GB total memory PC cluster 64 cpu (Pentium III, 550 MHz) Total memory - 9 GB Mar 2001

19 19 Chris Hempel, hpc.utexas.edu

20 20

21 21 Chris Hempel, hpc.utexas.edu

22 22 CPU Memory CPU Symmetric Multiple Processors Examples: SGI Power Challenge, Cray J90/T90 Memory Bus

23 23 Memory CPU Memory CPU Memory CPU Memory CPU Distributed Parallel Computing Examples: SP2, Beowulf

24 24

25 25

26 26

27 27 3) Message Passing

28 28 call MPI_SEND(sum,1,MPI_REAL,ito,itag, MPI_COMM_WORLD,ierror) call MPI_RECV(sum,1,MPI_REAL,ifrom,itag, MPI_COMM_WORLD,istatus,ierror) MPI commands -- examples

29 29 Some basic MPI functions Setup: mpi_init mpi_finalize Environment: mpi_comm_size mpi_comm_rank Communication: mpi_send mpi_receive Synchronization: mpi_barrier

30 30 Other important MPI functions Asynchronous communication: mpi_isend mpi_irecv mpi_iprobe mpi_wait/nowait Collective communication: mpi_barrier mpi_bcast mpi_gather mpi_scatter mpi_reduce mpi_allreduce Derived data types: mpi_type_contiguous mpi_type_vector mpi_type_indexed mpi_type_pack mpi_type_commit mpi_type_free Creating communicators: mpi_comm_dup mpi_comm_split mpi_intercomm_create mpi_comm_free

31 31 4) “Shared Memory”

32 32 CRAY: CMIC$ DO ALL do i=1,n a(i)=i enddo SGI:C$DOACROSS do i=1,n a(i)=i enddo OpenMP: C$OMP parallel do do i=1,n a(i)=i enddo Fortran directives --examples

33 33 OpenMP Summary OpenMP standard – first published Oct 1997 Directives Run-time Library Routines Environment Variables Versions for f77, f90, c, c++

34 34 OpenMP Summary Parallel Do Directive c$omp parallel do private(I) shared(a) c$omp end parallel do  optional do I=1,n a(I)= I+1 enddo

35 35 OpenMP Summary Defining a Parallel Region - Individual Do Loops c$omp parallel shared(a,b) do j=1,n a(j)=j enddo do k=1,n b(k)=k enddo c$omp do private(j) c$omp end do nowait c$omp do private(k) c$omp end do c$omp end parallel

36 36 OpenMP Summary Parallel Do Directive - Clauses shared private default(private|shared|none) reduction({operator|intrinsic}:var) if(scalar_logical_expression) ordered copyin(var)

37 37 OpenMP Summary Run-Time Library Routines Execution environment omp_set_num_threads omp_get_num_threads omp_get_max_threads omp_get_thread_num omp_get_num_procs omp_set_dynamic/omp_get_dynamic omp_set_nested/omp_get_nested

38 38 OpenMP Summary Run-Time Library Routines Lock routines omp_init_lock omp_destroy_lock omp_set_lock omp_unset_lock omp_test_lock

39 39 OpenMP Summary Environment Variables OMP_NUM_THREADS OMP_DYNAMIC OMP_NESTED

40 40 RISC memory levels CPU Main memory Cache Single CPU

41 41 RISC memory levels CPU Main memory Cache Single CPU

42 42 RISC memory levels Main memory Multiple CPU’s CPU Cache 1 CPU 0 1 Cache 0

43 43 RISC memory levels Main memory Multiple CPU’s CPU Cache 1 CPU 0 1 Cache 0

44 44 Main memory Multiple CPU’s CPU Cache 1 CPU 0 1 Cache 0 RISC Memory Levels

45 45 subroutine xmult (x1,x2,y1,y2,z1,z2,n) real x1(n),x2(n),y1(n),y2(n),z1(n),z2(n) real a,b,c,d do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo end A sample program

46 46 subroutine xmult (x1,x2,y1,y2,z1,z2,n) real x1(n),x2(n),y1(n),y2(n),z1(n),z2(n) real a,b,c,d c$omp parallel do do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo end A sample program

47 47 Run on Technion origin2000 Vector length = 1,000,000 Loop repeated 50 times Compiler optimization: low (-O1) Elapsed time, sec threads Compile 1 2 4 No parallel 15.0 15.3 Parallel 16.0 26.0 26.8 Is this running in parallel? A sample program

48 48 Run on Technion origin2000 Vector length = 1,000,000 Loop repeated 50 times Compiler optimization: low (-O1) Elapsed time, sec threads Compile 1 2 4 No parallel 15.0 15.3 Parallel 16.0 26.0 26.8 Is this running in parallel? WHY NOT? A sample program

49 49 c$omp parallel do do i=1,n a=x1(i)*x2(i); b=y1(i)*y2(i) c=x1(i)*y2(i); d=x2(i)*y1(i) z1(i)=a-b; z2(i)=c+d enddo Is this running in parallel? WHY NOT? Answer: by default, variables a,b,c,d are defined as SHARED A sample program

50 50 Elapsed time, sec threads Compile 1 2 4 No parallel 15.0 15.3 Parallel 16.0 8.5 4.6 Solution: define a,b,c,d as PRIVATE: c$omp parallel do private(a,b,c,d) This is now running in parallel A sample program

51 51 5) Case Study

52 52 HPC in the Technion SGI Origin2000 22 cpu (R10000) -- 250 MHz Total memory -- 5.6 GB PC cluster (linux redhat 6.1) 6 cpu (pentium II - 400MHz) Memory - 500 MB/cpu

53 53 Fluent test case -- Stability of a subsonic turbulent jet Source: Viktoria Suponitsky Faculty of Aerospace Engineering, Technion

54 54

55 55 Reading "Case25unstead.cas"... 10000 quadrilateral cells, zone 1, binary. 19800 2D interior faces, zone 9, binary. 50 2D wall faces, zone 3, binary. 100 2D pressure-inlet faces, zone 7, binary. 50 2D pressure-outlet faces, zone 5, binary. 50 2D pressure-outlet faces, zone 6, binary. 50 2D velocity-inlet faces, zone 2, binary. 100 2D axis faces, zone 4, binary. 10201 nodes, binary. 10201 node flags, binary. Fluent test case 10 time steps, 20 iterations per time step

56 56

57 57

58 58 Host spawning Node 0 on machine "parix". ID Comm. Hostname O.S. PID Mach ID HW ID Name ------------------------------------------------------------- host net parix irix 19732 0 7 Fluent Host n7 smpi parix irix 19776 0 7 Fluent Node n6 smpi parix irix 19775 0 6 Fluent Node n5 smpi parix irix 19771 0 5 Fluent Node n4 smpi parix irix 19770 0 4 Fluent Node n3 smpi parix irix 19772 0 3 Fluent Node n2 smpi parix irix 19769 0 2 Fluent Node n1 smpi parix irix 19768 0 1 Fluent Node n0* smpi parix irix 19767 0 0 Fluent Node Fluent test case SMP command: fluent 2d -t8 -psmpi -g < inp

59 59 Fluent test case Cluster command: fluent 2d -cnf=clinux1,clinux2,clinux3,clinux4,clinux5,clinux6 -t6 –pnet -g < inp Node 0 spawning Node 5 on machine "clinux6". ID Comm. Hostname O.S. PID Mach ID HW ID Name ----------------------------------------------------------- n5 net clinux6 linux-ia32 3560 5 9 Fluent Node n4 net clinux5 linux-ia32 19645 4 8 Fluent Node n3 net clinux4 linux-ia32 16696 3 7 Fluent Node n2 net clinux3 linux-ia32 17259 2 6 Fluent Node n1 net clinux2 linux-ia32 18328 1 5 Fluent Node host net clinux1 linux-ia32 10358 0 3 Fluent Host n0* net clinux1 linux-ia32 10400 0 -1 Fluent Node

60 60

61 61

62 62 TOP500 (November 2, 2000)

63 63 TOP500 (November 2, 2000)


Download ppt "1 ISCM-10 Taub Computing Center High Performance Computing for Computational Mechanics Moshe Goldberg March 29, 2001."

Similar presentations


Ads by Google