Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA.

Similar presentations


Presentation on theme: "1 Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA."— Presentation transcript:

1 1 Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA

2 2 Overview High performance computing High performance computing High throughput computing High throughput computing NOW, HPC, and HTC NOW, HPC, and HTC Parallel algorithms Parallel algorithms Software technologies Software technologies

3 3 “High Performance” Computing CPU clock frequency CPU clock frequency Parallel computers Parallel computers Alternate technologies Alternate technologies Optical Optical Bio Bio Molecular Molecular

4 4 “Parallel” Computing Traditional supercomputers Traditional supercomputers SIMD, MIMD, pipelines SIMD, MIMD, pipelines Tightly coupled shared memory Tightly coupled shared memory Bus level connections Bus level connections Expensive to buy and to maintain Expensive to buy and to maintain Cooperating networks of computers Cooperating networks of computers

5 5 “NOW” Computing Workstation Workstation Network Network Operating System Operating System Cooperation Cooperation Distributed (Application) Programs Distributed (Application) Programs

6 6 Traditional Supercomputers Very high starting cost Very high starting cost Expensive hardware Expensive hardware Expensive software Expensive software High maintenance High maintenance Expensive to upgrade Expensive to upgrade

7 7 Traditional Supercomputers No one is predicting their demise, but …

8 8 Computational Grids are the future

9 9 Computational Grids “Grids are persistent environments that enable software applications to integrate instruments, displays, computational and information resources that are managed by diverse organizations in widespread locations.”

10 10 Computational Grids Individual nodes can be supercomputers, or NOW Individual nodes can be supercomputers, or NOW High availability High availability Accommodate peak usage Accommodate peak usage LAN : Internet :: NOW : Grid LAN : Internet :: NOW : Grid

11 11 “NOW” Computing Workstation Workstation Network Network Operating System Operating System Cooperation Cooperation Distributed+Parallel Programs Distributed+Parallel Programs

12 12 “Workstation Operating System” Authenticated users Authenticated users Protection of resources Protection of resources Multiple processes Multiple processes Preemptive scheduling Preemptive scheduling Virtual Memory Virtual Memory Hierarchical file systems Hierarchical file systems Network centric Network centric

13 13 Network Ethernet Ethernet 10 Mbpsobsolete 10 Mbpsobsolete 100 Mbps almost obsolete 100 Mbps almost obsolete 1000 Mbpsstandard 1000 Mbpsstandard Protocols Protocols TCP/IP TCP/IP

14 14 Cooperation Workstations are “personal” Workstations are “personal” Use by others Use by others slows you down slows you down Increases privacy risks Increases privacy risks Decreases security Decreases security … Willing to share Willing to share Willing to trust Willing to trust

15 15 Distributed Programs Spatially distributed programs Spatially distributed programs A part here, a part there, … A part here, a part there, … Parallel Parallel Synergy Synergy Temporally distributed programs Temporally distributed programs Finish the work of your “great grand father” Finish the work of your “great grand father” Compute half today, half tomorrow Compute half today, half tomorrow Combine the results at the end Combine the results at the end Migratory programs Migratory programs Have computation, will travel Have computation, will travel

16 16 SPMD Single program, multiple data Single program, multiple data Contrast with SIMD Contrast with SIMD Same program runs on multiple nodes Same program runs on multiple nodes May or may not be lock-step May or may not be lock-step Nodes may be of different speeds Nodes may be of different speeds Barrier synchronization Barrier synchronization

17 17 Conceptual Bases of Distributed+Parallel Programs Spatially distributed programs Spatially distributed programs Message passing Message passing Temporally distributed programs Temporally distributed programs Shared memory Shared memory Migratory programs Migratory programs Serialization of data and programs Serialization of data and programs

18 18 (Ordinary) Shared Memory Simultaneous read/write access Simultaneous read/write access Read : read Read : read Read : write Read : write Write : write Write : write Semantics not clean Semantics not clean Even when all processes are on the same processor Even when all processes are on the same processor Mutual exclusion Mutual exclusion

19 19 Distributed Shared Memory “Simultaneous” read/write access by spatially distributed processors “Simultaneous” read/write access by spatially distributed processors Abstraction layer of an implementation built from message passing primitives Abstraction layer of an implementation built from message passing primitives Semantics not so clean Semantics not so clean

20 20 Conceptual Bases for Migratory programs Same CPU architecture Same CPU architecture X86, PowerPC, MIPS, SPARC, …, JVM X86, PowerPC, MIPS, SPARC, …, JVM Same OS + environment Same OS + environment Be able to “checkpoint” Be able to “checkpoint” suspend, and suspend, and then resume computation then resume computation without loss of progress without loss of progress

21 21 Clusters of Workstations Inexpensive alternative to traditional supercomputers Inexpensive alternative to traditional supercomputers High availability High availability Lower down time Lower down time Easier access Easier access Development platform with production runs on traditional supercomputers Development platform with production runs on traditional supercomputers

22 May 2008Mateti-Everything-About-Linux22 Cluster Characteristics Commodity off the shelf hardware Commodity off the shelf hardware Networked Networked Common Home Directories Common Home Directories Open source software and OS Open source software and OS Support message passing programming Support message passing programming Batch scheduling of jobs Batch scheduling of jobs Process migration Process migration

23 May 2008Mateti-Everything-About-Linux23 Why are Linux Clusters Good? Low initial implementation cost Low initial implementation cost Inexpensive PCs Inexpensive PCs Standard components and Networks Standard components and Networks Free Software: Linux, GNU, MPI, PVM Free Software: Linux, GNU, MPI, PVM Scalability: can grow and shrink Scalability: can grow and shrink Familiar technology, easy for user to adopt the approach, use and maintain system. Familiar technology, easy for user to adopt the approach, use and maintain system.

24 May 2008Mateti-Everything-About-Linux24 Example Clusters July 1999 July nodes 1000 nodes Used for genetic algorithm research by John Koza, Stanford University Used for genetic algorithm research by John Koza, Stanford University programming.com/ programming.com/ programming.com/ programming.com/

25 May 2008Mateti-Everything-About-Linux25 Largest Cluster System IBM BlueGene, 2007 IBM BlueGene, 2007 DOE/NNSA/LLNL DOE/NNSA/LLNL Memory: GB Memory: GB OS: CNK/SLES 9 OS: CNK/SLES 9 Interconnect: Proprietary Interconnect: Proprietary PowerPC 440 PowerPC ,496 nodes 106,496 nodes Tera FLOPS on LINPACK Tera FLOPS on LINPACK

26 May 2008Mateti-Everything-About-Linux26 OS Share of Top 500 OS Count Share Rmax (GF) Rpeak (GF) Processor Linux % Windows % Unix % BSD % Mixed % MacOS % Totals % Nov 2007

27 27 Development of Distributed+Parallel Programs New code + algorithms New code + algorithms Old programs rewritten in new languages that have distributed and parallel primitives Old programs rewritten in new languages that have distributed and parallel primitives Parallelize legacy code Parallelize legacy code

28 28 New Programming Languages With distributed and parallel primitives With distributed and parallel primitives Functional languages Functional languages Logic languages Logic languages Data flow languages Data flow languages

29 29 Parallel Programming Languages based on the shared-memory model based on the shared-memory model based on the distributed-memory model based on the distributed-memory model parallel object-oriented languages parallel object-oriented languages parallel functional programming languages parallel functional programming languages concurrent logic languages concurrent logic languages

30 Condor Cooperating workstations: come and go. Cooperating workstations: come and go. Migratory programs Migratory programs Checkpointing Checkpointing Remote IO Remote IO Resource matching Resource matching

31 Portable Batch System (PBS) Prepare a.cmd file Prepare a.cmd file naming the program and its arguments naming the program and its arguments properties of the job properties of the job the needed resources the needed resources Submit.cmd to the PBS Job Server: qsub command Submit.cmd to the PBS Job Server: qsub command Routing and Scheduling: The Job Server Routing and Scheduling: The Job Server examines.cmd details to route the job to an execution queue. examines.cmd details to route the job to an execution queue. allocates one or more cluster nodes to the job allocates one or more cluster nodes to the job communicates with the Execution Servers (mom's) on the cluster to determine the current state of the nodes. communicates with the Execution Servers (mom's) on the cluster to determine the current state of the nodes. When all of the needed are allocated, passes the.cmd on to the Execution Server on the first node allocated (the "mother superior"). When all of the needed are allocated, passes the.cmd on to the Execution Server on the first node allocated (the "mother superior"). Execution Server Execution Server will login on the first node as the submitting user and run the.cmd file in the user's home directory. will login on the first node as the submitting user and run the.cmd file in the user's home directory. Run an installation defined prologue script. Run an installation defined prologue script. Gathers the job's output to the standard output and standard error Gathers the job's output to the standard output and standard error It will execute installation defined epilogue script. It will execute installation defined epilogue script. Delivers stdout and stdout to the user. Delivers stdout and stdout to the user.

32 TORQUE, an open source PBS Tera-scale Open-source Resource and QUEue manager (TORQUE) enhances OpenPBS Tera-scale Open-source Resource and QUEue manager (TORQUE) enhances OpenPBS Fault Tolerance Fault Tolerance Additional failure conditions checked/handled Additional failure conditions checked/handled Node health check script support Node health check script support Scheduling Interface Scheduling Interface Scalability Scalability Significantly improved server to MOM communication model Significantly improved server to MOM communication model Ability to handle larger clusters (over 15 TF/2,500 processors) Ability to handle larger clusters (over 15 TF/2,500 processors) Ability to handle larger jobs (over 2000 processors) Ability to handle larger jobs (over 2000 processors) Ability to support larger server messages Ability to support larger server messages Logging Logging

33 OpenMP for shared memory Distributed shared memory API Distributed shared memory API User-gives hints as directives to the compiler User-gives hints as directives to the compiler

34 Message Passing Libraries Programmer is responsible for initial data distribution, synchronization, and sending and receiving information Programmer is responsible for initial data distribution, synchronization, and sending and receiving information Parallel Virtual Machine (PVM) Parallel Virtual Machine (PVM) Message Passing Interface (MPI) Message Passing Interface (MPI) Bulk Synchronous Parallel model (BSP) Bulk Synchronous Parallel model (BSP)

35 BSP: Bulk Synchronous Parallel model Divides computation into supersteps Divides computation into supersteps In each superstep a processor can work on local data and send messages. In each superstep a processor can work on local data and send messages. At the end of the superstep, a barrier synchronization takes place and all processors receive the messages which were sent in the previous superstep At the end of the superstep, a barrier synchronization takes place and all processors receive the messages which were sent in the previous superstep

36 BSP Library Small number of subroutines to implement Small number of subroutines to implement process creation, process creation, remote data access, and remote data access, and bulk synchronization. bulk synchronization. Linked to C, Fortran, … programs Linked to C, Fortran, … programs

37 BSP: Bulk Synchronous Parallel model Book: Rob H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP and MPI,” Oxford University Press, 2004, 324 pages, ISBN Book: Rob H. Bisseling, Parallel Scientific Computation: A Structured Approach using BSP and MPI,” Oxford University Press, 2004, 324 pages, ISBN

38 38 PVM, and MPI Message passing primitives Message passing primitives Can be embedded in many existing programming languages Can be embedded in many existing programming languages Architecturally portable Architecturally portable Open-sourced implementations Open-sourced implementations

39 May 2008Mateti-Everything-About-Linux39 Parallel Virtual Machine (PVM) PVM enables a heterogeneous collection of networked computers to be used as a single large parallel computer. PVM enables a heterogeneous collection of networked computers to be used as a single large parallel computer. Older than MPI Older than MPI Large scientific/engineering user community Large scientific/engineering user community

40 May 2008Mateti-Everything-About-Linux40 Message Passing Interface (MPI)‏ MPI-2.0 MPI-2.0 MPICH: by Argonne National Laboratory and Missisippy State University MPICH: by Argonne National Laboratory and Missisippy State Universitywww.mcs.anl.gov/mpi/mpich/ LAM: LAM:

41 May 2008Mateti-Everything-About-Linux41 Kernels Etc Mods for Clusters Dynamic load balancing Dynamic load balancing Transparent process-migration Transparent process-migration Kernel Mods Kernel Mods CLuster Membership Subsystem ("CLMS") and CLuster Membership Subsystem ("CLMS") and Internode Communication Subsystem Internode Communication Subsystem GlusterFS: Clustered File Storage of peta bytes. GlusterFS: Clustered File Storage of peta bytes. GlusterHPC: High Performance Compute Clusters GlusterHPC: High Performance Compute Clusters Open-source software for volunteer computing and grid computing Open-source software for volunteer computing and grid computing Condor clusters Condor clusters

42 More Information on Clusters IEEE Task Force on Cluster Computing IEEE Task Force on Cluster Computing “a central repository of links and information regarding Linux clustering, in all its forms.” “a central repository of links and information regarding Linux clustering, in all its forms.” resources for of clusters built on commodity hardware deploying Linux OS and open source software. resources for of clusters built on commodity hardware deploying Linux OS and open source software. “Authoritative resource for information on Linux Compute Clusters and Linux High Availability Clusters.” “Authoritative resource for information on Linux Compute Clusters and Linux High Availability Clusters.” “To provide education and advanced technical training for the deployment and use of Linux-based computing clusters to the high-performance computing community worldwide.” “To provide education and advanced technical training for the deployment and use of Linux-based computing clusters to the high-performance computing community worldwide.”

43 References Cluster Hardware Setup owulf_book/beowulf_book.pdf Cluster Hardware Setup owulf_book/beowulf_book.pdf owulf_book/beowulf_book.pdf owulf_book/beowulf_book.pdf PVM PVM MPI MPI Condor Condor


Download ppt "1 Introduction to Cluster Computing Prabhaker Mateti Wright State University Dayton, Ohio, USA."

Similar presentations


Ads by Google