Clusters Part 2 - Hardware Lars Lundberg The slides in this presentation cover Part 2 (Chapters 5-7) in Pfister’s book.

Clusters Part 2 - Hardware Lars Lundberg The slides in this presentation cover Part 2 (Chapters 5-7) in Pfister’s book

Exposed vs. Enclosed Clusters Intra-cluster communication Enclosed Exposed Intra-cluster communication

Exposed Clusters n The nodes must communicate by messages, since public standard communication is always message-based n Communication has high overhead since it is based on standard protocols n The communication channel itself is not secure, so additional work must be done to ensure the privacy of intracluster communication n It is relatively easy to include computers that are spread out across a campus area or an company n These clusters are easy to build. In fact, you do not have to build them at all. It is just a matter of running the right software.

Enclosed Clusters n Communication can be by a number of means: shared disk, shared memory, messages etc. n It is possible to obtain communication with low overhead n The security of the communication is implicit n It is easier to implement cluster software on enclosed clusters, since security is not an issue, he cluster cannot be split into two parts that may have to be merged later

“Glass-House” vs. “Campus-Wide” Clusters In the “glass-house” case the computers are fully dedicated to their use as shared computational resources and will therefor be located in a geographically compact arrangement (the glass-house) In the “campus-wide” case (also know as NOW - Network Of Workstations) the computers are located on the users’ desks. Campus-wide clusters operate in a less-controlled environment and they must quickly and totally relinquish use of a node to a user.

The Four Categories of Cluster Hardware n I/O-Attached Message-Based n I/O-Attached Shared Storage n Memory-Attached Shared Storage n Memory-Attached Message-Based

I/O-Attached Message-Based Processor MemI/O Processor MemI/O LAN FDDI ATM etc

I/O-Attached Shared Storage Processor MemI/O Processor MemI/O

Memory-Attached Shared Storage (Global shared memory) Processor MemI/O Processor MemI/O Shared Memory

Memory-Attached Shared Storage (Distributed shared memory) Processor MemI/O Processor MemI/O This architecture can also be used for Memory-Attached Message-Based, even if no such systems are available at the moment

I/O- vs. Memory-Attached n I/O-attached message-passing is the only possibility for heterogeneous systems n Memory attachment in general is harder than I/O attachment, for two reasons: u The hardware of most machines is designed to accept foreign attachments in its I/O system u The software for the basic memory-to-memory is more difficult to construct n When memory attachment is operational is can potentially provide communication that is dramatically faster than that of I/O attachment

Shared Storage vs. Message-based n Shared storage are considered to be easier to use and program (Pfister is not only considering shared-disk clusters but also SMP computers) n Message-passing is considered to more portable and scalable n The hardware aspect is mainly a performance issue, whereas the programming model concerns the usability of the system, e.g. a shared memory (or disk) model can be obtained without physically sharing the memory or disk.

Communication Requirements n The required bandwidth between the cluster nodes is (obviously) very depending on the workload. n For I/O intensive workloads the intra-cluster communication bandwidth should at least equal the aggregate bandwidth from all other I/O sources that each node has. n The bandwidth requirements are particularly difficult to meet in shared nothing (message-based) clusters. n A number of techniques have been developed for increasing the intra cluster communication bandwidth (see Section 5.5 in Pfister’s book).

Symmetric Multiprocessors (SMPs) Processor I/OMemory Disk LAN

SMP Caches Processor I/OMemory Disk LAN Cache

NUMA Multiprocessors Processor MMU Memory Processor MMU Memory Processor MMU Memory Processor node

CC-NUMA Multiprocessors Processor node Cache MMU Memory Processor Cache MMU Memory Processor Cache MMU Memory Processor

COMA Multiprocessors Processor node Cache MMU Attraction Memory Processor Cache MMU Attraction Memory Processor Cache MMU Attraction Memory Processor

Running serial programs on a cluster n It is simple (almost trivial), but very useful to run a number of serial jobs on cluster. The relevant performance metric in this case is throughput. n Three types of serial workloads can be distinguished: u Batch processing u Interactive logins, e.g. one can log onto a cluster without specifying a node. Useful in number-crunching applications with intermediate results u Multijob parallel, e.g. a sequence of coarse grained jobs (almost the same as batch processing)

Running parallel programs on a cluster We classify parallel programs into two categories: n Programs that justify that a large effort is used for making them run efficiently on a cluster, e.g.: u Grand challenge problems: global weather simulation etc. u Heavily used programs, DBMS, LINPACK etc. u Academic research n Programs where only a minimal effort is justified for making them run efficiently on a cluster, e.g.: u Database applications - use parallel DBMS u Technical computing - use parallel LINPACK etc. u Programs that are parallelized automatically by the compiler

Amdahl’s Law Total execution time = serial part + parallel part If we use N processors (computers), the best we can hope for is the following: Total execution time = serial part + (parallel part / N) For instance, if the serial part is 5% of the total execution time, the best we can hope for is a speedup of 20 even if we use hundreds or thousands of processors.

Programming models n Programs written to exploit SMP parallelism will not work (efficiently) on clusters n Programs written to exploit message-based cluster parallelism will not work (efficiently) on SMPs n Pfister has a long discussion about this in chapter 9.

Serial program do forever max_change = 0; for y = 2 to N-1 for x = 2 to N-1 old_value = v[x,y] v[x,y] = (v[x-1,y] + v[x+1,y] + v[x,y-1] + v[x,y+1])/4 max_change = max(max_change, abs(old_value - v[x,y])) end for x end for y if max_change < close_enough then leave do forever end do forever

Parallel program - first attempt do forever max_change = 0; forall y = 2 to N-1 forall x = 2 to N-1 old_value = v[x,y] v[x,y] = (v[x-1,y] + v[x+1,y] + v[x,y-1] + v[x,y+1])/4 max_change = max(max_change, abs(old_value - v[x,y])) end forall x end forall y if max_change < close_enough then leave do forever end do forever

Parallel program - second attempt do forever max_change = 0; forall y = 2 to N-1 forall x = 2 to N-1 old_value = v[x,y] v[x,y] = (v[x-1,y] + v[x+1,y] + v[x,y-1] + v[x,y+1])/4 aquire(max_change_lock) max_change = max(max_change, abs(old_value - v[x,y])) release(max_change_lock) end forall x end forall y if max_change < close_enough then leave do forever end do forever

Parallel program - third attempt do forever max_change = 0; forall y = 2 to N-1 row_max = 0; for x = 2 to N-1 old_value = v[x,y] v[x,y] = (v[x-1,y] + v[x+1,y] + v[x,y-1] + v[x,y+1])/4 row_max = max(row_max, abs(old_value-v[x,y])) end for x aquire(max_change_lock) max_change = max(max_change,row_max) release(max_change_lock) end forall y if max_change < close_enough then leave do forever end do forever

Commercial programming models For systems with a small (< 16) processors: n Threads n Processes that share a memory segment For larger systems: n Global I/O, i.e. all computers use the same file system n RPC (Remote Procedure Calls) n Global Locks

Clusters Part 2 - Hardware Lars Lundberg The slides in this presentation cover Part 2 (Chapters 5-7) in Pfister’s book.

Similar presentations

Presentation on theme: "Clusters Part 2 - Hardware Lars Lundberg The slides in this presentation cover Part 2 (Chapters 5-7) in Pfister’s book."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Clusters Part 2 - Hardware Lars Lundberg The slides in this presentation cover Part 2 (Chapters 5-7) in Pfister’s book.

Similar presentations

Presentation on theme: "Clusters Part 2 - Hardware Lars Lundberg The slides in this presentation cover Part 2 (Chapters 5-7) in Pfister’s book."— Presentation transcript:

Similar presentations

About project

Feedback