1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi

2 Motivation Why high amount of computation is needed?   Genetic engineering: Searching for matching DNA patterns in large DNA banks.   Cosmology: Simulations on very complex systems, such as simulating the formation of a galaxy Climate: Solving very high precision floating point calculations, simulating chaotic systems. Financial modeling and commerce: Simulating chaotic systems, like climate modeling problem. Cryptography: Searching very large state spaces, to find out the cryptographic key; factoring very large numbers.   Software: Searching large state spaces for evaluating and verifying the software.

3  Ways to improve performance?   Work harder   Work smarter   Get help   Hardware improvements   Better algorithms   Parallelism

4 Moore’s Law  Moore’s Law:  In 18 months, processing capacity doubles.  Performance improvements of high performance computing??  Higher than expected by Moore’s Law.  Due to parallelism?

5 Types of Parallel Architectures

6  Single Instruction Single Data (SISD)  Single Instruction Multiple Data (SIMD):  Multiple Instruction Single Data (MISD):  Multiple Instruction Multiple Data (MIMD):

7 Single Instruction Single Data (SISD)  No parallelism  Simple single processor

8  Single Instruction Multiple Data (SIMD):  Single instruction is executed by multiple processors on different data streams.  Instruction memory is single.  There are multiple data memory units.  Vector architectures are of SIMD type.

9  Multiple Instruction Single Data (MISD):  Not commercially built.  Refers to the structure where a single data stream is operated by different functional units.

10  Multiple Instruction Multiple Data (MIMD):  Each processor has its own instruction and data memory. This is the type we are interested in.

11 High Performance Computing Techniques  Supercomputers  Clusters  Grid Systems Custom build Shared memory processing (SMP) Not-parallelizable problems Optimized processors Use parallelism Consists of more than one computers Distributed memory processing “Internet is the computer” No geographical limitations

12  Main idea in cluster architectures  The old idea of parallel computing  (physical clustering of general purpose hardware and message passing of distributed computing)  New low cost technology (mass market COTS pc and networking products).

13 What is a Cluster???

14  Many definitions:  A cluster is two or more independent computers that are connected by a dedicated network to perform a joint task.  A cluster is a group of servers that coordinate their actions to provide scalable, high available services.

15 A cluster is a type of parallel and distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.

16 Basic cluster  Multiple computing nodes,  low cost  a fully functioning computer with its own memory, CPU, possibly storage  own instance of operating system  computing nodes are connected by interconnects  typically low cost, high bandwidth and low latency  permanent, high performance data storage  a resource manager to distribute and schedule jobs  the middleware that allows the computers act as a distributed or parallel system  parallel applications designed to run on it

17 A cluster architecture   High speed interconnect   Computing nodes Myricom—1.28 Gbps in each direction IEEE SCI latency under 2.5 microseconds, 3.2 Gbps each direction (ring or torus topology) Ethernet-star topology In most cases limitation is the server’s internal PCI bus system. Cluster Middleware To support Single System Image (SSI) Resource management and scheduling software -Initial installation -Administration -Scheduling -Allocation of hardware -Allocation software components Parallel programming environments and tools Compilers Parallel Virtual Machine (PVM) Message Passing Interface (MPI) Parallel and sequential applications   Master nodes

18 Types of Clusters

19  High performance  High availability High computing capability Consider the fail possibility of each hardware of software Includes redundancy A subset of this type is the load balancing clusters Typically used for business applications—web servers

20  Homogeneous clusters:  Heterogeneous clusters: In homogeneous clusters all nodes have similar properties. Each node is much like any other. Amount of memory and interconnects are similar.In homogeneous clusters all nodes have similar properties. Each node is much like any other. Amount of memory and interconnects are similar.  Nodes have different characteristics, in the sense of memory and interconnect performance.

21  Single-tier clusters:  Multi-tier clusters: There is no hierarchy of nodes is defined. Any node may be used for any purpose. The main advantage of the single tier cluster is its simplicity. The main disadvantage is its limit to be expanded.There is no hierarchy of nodes is defined. Any node may be used for any purpose. The main advantage of the single tier cluster is its simplicity. The main disadvantage is its limit to be expanded.  There is a hierarchy between nodes. There are node sets, where each set has a specialized function

22 Clusters in the Flynn’s Taxanomy

23  Multiple Instruction Multiple Data (MIMD)  Distributed Memory Processing (DMP) Each of the nodes has its own instruction memory and data memory.Each of the nodes has its own instruction memory and data memory. Programs can not directly access the memory of remote systems in the cluster. They have to use a kind of message passing between nodes.Programs can not directly access the memory of remote systems in the cluster. They have to use a kind of message passing between nodes.

24 Benefits of Clusters

25  Ease of building:  No expensive and long development projects.  Price performance benefit:  Highly available COTS products are used.  Flexibility of configuration:  Number of nodes, nodes’ performance, interconnection topology can be upgraded. System can be modified without loss of prior work. Scale up: Increasing the throughput of each computing node.Scale up: Increasing the throughput of each computing node. Scale out: Increase the number of computing nodes. Requires efficient i/o between nodes and cost effective management of large number of nodes.

26 Efficiency of a Cluster

27  Cluster throughout is a function of the following  CPUs: Total number and speed of cpus  Efficiency of the parallel algorithms  Inter-Process Communication: Efficiency of the inter-process communication between the computing nodes  Storage I/O: Frequency and size of input data reads and output data writes  Job Scheduling: Efficiency of the scheduling

28 Top 500 HPCs YearPercentage# Processors# Max. Linpack (GF) 20018.602252618832 200218.606661477707 200342.00120606264332 200458.80210253614022 200572.003218051109962 Linpack is a measure of a processor’s floating point execution.Linpack is a measure of a processor’s floating point execution.

29 BlueGene/L: An Example Cluster-Base Computer  By January 2005, was ranked as the world’s most powerful computer,  in terms of Linpack performance.  Developed by IBM and US Department of Energy.  A heterogeneous cluster,  dedicated nodes to specific functions.  A cluster of 65536 nodes.  Computing nodes have two processors,  resulting in more than 130000 processors total.  Each processor has a dual floating point unit.  It includes host nodes for management purposes.  1024 of the nodes are I/O nodes to the outside world.  I/O nodes run Linux.  The compute nodes run a simple specialized OS.  Uses message passing,  in an interconnection network of tree structure.  Each computing node has a 2 Gb RAM,  which is shared between the two processors of the node.

30 Ongoing Research on Clusters

31  Management of large clusters  Request distributions,  Optimizing load balance  Health monitoring of the cluster  Connected clusters using Grid technology  Nonlinearity of scaling  In the ideal case, n number of CPUs should perform n times better than a single CPU. However, performance gain does not increase linearly.  The main challenge : Developing parallel algorithms  To minimize inter-nodal communications Program development for parallel architectures is a difficult problem because of two reasons:  Describing the applications concurrency and data dependencies.  Exploiting the processing resources of the architecture in order to obtain an efficient implementation for a specific hardware.

32 A Multi-tier Cluster Architecture Input traffic arrives at one ore front end load balancersInput traffic arrives at one ore front end load balancers Static content is served.Static content is served. Application servers serve the dynamic content. The application servers are responsible for financial transaction functions such as the order entry, catalog search, etc.Application servers serve the dynamic content. The application servers are responsible for financial transaction functions such as the order entry, catalog search, etc. The third tier may consist of multiple database servers, which are specialized for different data sets.The third tier may consist of multiple database servers, which are specialized for different data sets. Load balancing between tiers.Load balancing between tiers.

33  To sum up…  Highly promising for HPC  Cheap  Easy to obtain and develop  Applicable for many diverse applications  Not the answer for all questions  Not applicable for non-parallelizable applications

34  Thanks for listening…  Questions?

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Similar presentations

Presentation on theme: "1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Similar presentations

Presentation on theme: "1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi."— Presentation transcript:

Similar presentations

About project

Feedback