Presentation is loading. Please wait.

Presentation is loading. Please wait.

Yasmin Shared memory programming Enno Rehling Universität Paderborn.

Similar presentations


Presentation on theme: "Yasmin Shared memory programming Enno Rehling Universität Paderborn."— Presentation transcript:

1 Yasmin Shared memory programming Enno Rehling Universität Paderborn

2 Overview zShared memory programming on SCI zThe Yasmin library zCommon pitfalls and caveats zConclusion and Outlook

3 SCI shared memory zSCI can do more than message passing. zThe shared memory programming model is more intuitive. zIt’s more efficient, too: SCI offers hardware support for distributed shared memory programming. zRemote memory and local memory have very different properties.

4 SCI shared memory zMemory from remote nodes is mapped into the PCI address space zAll processes can handle local and remote mapped memory in the same way PCI Process B physicalPCI Process A physical

5 Performance Data* zBandwidth: ywriting at 65 MB/sec yreading at 1.7 MB/sec zLatency: y4.7 µsec (MPI, zero byte ping-pong) y2.7 µsec (Yasmin, 4 byte ping-pong) * Benchmarks were done on outdated hardware.

6 Yasmin: Shared memory API zDeveloped at Paderborn zAPI layer on top of the SCI driver zRuns on Linux but not Solaris zReliable, fast and extensively tested

7 Distributed segments zUser creates and exports distributed segments void* foo(size_t block_size, sci_group_p group) { int procs = sci_get_groupsize(group); sci_distr_seg_p segment; void *base = NULL; size_t *sizes = malloc(procs * sizeof(size_t)); for (int i = 0; i != procs; ++i) sizes[i] = block_size; sci_create_distr_seg(group, &base, sizes, &segment); free(sizes); return base; } base

8 Consistent view zSame view of shared segments for each process void* bar(sci_group_p group, size_t size) { int rank = sci_get_rank(group); int procs = sci_get_groupsize(group); void *base = foo(size*procs, group); void *msg = base+size*(procs+rank); do_some_work(msg, size); for (int i=0;i!=procs;++i) if (i!=rank) memcpy(base+size*(i*procs+rank), msg, size); sci_barrier(group); return base + size*rank*procs; }

9 Allocating memory zAllocation in both local and remote segments void foobar(char * s, sci_group_p group, char** msg[]) { int rank = sci_get_rank(group); int procs = sci_get_groupsize(group); int error = sci_heap_create(4096); int next = (rank+1)%procs, prev = (rank+procs-1)%procs; char *dst = (char*)sci_heap_malloc(rank+1, strlen(s)+1); *msg[next] = strcpy(dst, s); sci_barrier(group); printf("message from %d: %s\n", prev, *msg[rank]); sci_heap_free(*msg[rank]); } zAllocation in both local and remote segments void foobar(char * s, sci_group_p group, char** msg[]) { int rank = sci_get_rank(group); int procs = sci_get_groupsize(group); int error = sci_heap_create(4096); int next = (rank+1)%procs, prev = (rank+procs-1)%procs; char *dst = (char*)sci_heap_malloc(rank+1, strlen(s)+1); *msg[next] = strcpy(dst, s); while (*msg[rank]==NULL); printf("message from %d: %s\n", prev, *msg[rank]); sci_heap_free(*msg[rank]); }

10 What else? zSynchronization primitives ysynchronization barriers ymutexes yreader/writer locks ycondition variables zGroup operations ycreate static subgroups yall functions work on subgroups

11 Startup mechanism zHostfile ycontains list of hostnames to run program on ydefines number of processes per node on SMP zProcesses are created using rsh or ssh youtput returned to shell or into file yeasy debugging

12 Debugging SCI Applications

13 Profiling zDiploma Thesis at Paderborn zDynamic programm analysis zGathers information about access patterns of SCI programs, helps identify performance bottlenecks

14 SCI pitfalls and caveats zRead access to remote memory is slow. yUse sci_memcpy(), not memcpy. MMX instructions boost performance by a factor of 2 zSometimes, access to remote memory can fail. yHardware problems lead to errors yDriver functions can lead to errors

15 SCI pitfalls and caveats zAwkward consistency model void awkward(volatile int *base) { *base = 7; sci_flush(); *base = 2; fprintf(stdout, "%d\n", *base); // undefined } Programming without explicit knowledge of memory layout is never efficient

16 Conclusions zYasmin greatly simplifies use of shared segments zRaw performance plus easier development zTransparent shared memory programming on SCI is not yet achieved

17 Outlook zIntegration with CCS zYasmin goes open source zInclusion in SuSE cluster CD


Download ppt "Yasmin Shared memory programming Enno Rehling Universität Paderborn."

Similar presentations


Ads by Google