Presentation on theme: "Clusters Part 1 - Definition of and motivation for clusters Lars Lundberg The slides in this presentation cover Part 1 (Chapters 1-4) in Pfister’s book."— Presentation transcript:
Clusters Part 1 - Definition of and motivation for clusters Lars Lundberg The slides in this presentation cover Part 1 (Chapters 1-4) in Pfister’s book
Introduction There are three ways of doing anything faster: n Work harder (increased processor speed) n Work smarter (better algorithms) n Get help (parallel processing) This course is about clusters, and they are one way of “getting help”, i.e. one way of obtaining parallel processing
Work harder The processor speed is increasing with a factor of two every 9-18 months (depending on who you ask)
Getting help n Parallel processing occurs on many level, e.g., instruction parallelism inside the processor (superscalar). n We are focusing on parallelism that is visible in the program in the form of multiple processes or threads. n It is cost-effective to build a large computer based on a (large) number of cheap microprocessors. n It is relatively easy to build the multiprocessor hardware, but much more difficult to build good parallel software. n Massive multiprocessors can potentially solve challenging problems, e.g., global weather simulation, full system simulation of cars and airplanes.
Lowly Parallel Processing The current market for massively parallel multiprocessors is small, but it is increasingly interesting to connect a small number (e.g. 2-16) of computers in a cluster. There are at least two reasons for this: n Microprocessors are getting faster, i.e. many problems can be solved without the aid of massively parallel computers. n Availability (i.e. non-stop operation) is becoming increasingly important.
Availability It is (almost) always desirable to build systems that will not stop working, and cluster technology makes it possible to obtain high availability for a reasonable cost. In its simplest form cluster availability is obtained by having two computers. One active (primary) computer and one stand-by (secondary) computer; if the primary computer fails you simply switch to the secondary computer.
Availability continued Instead of having one stand-by computer that just “sits getting dusty” almost all the time, we use both, and if either fails move all the work to one until you fix the one that died. We have now started to do (lowly) parallel processing across those two computers. In order to obtain higher availability we may want to use more than just two computers.
Motivation for Clusters Based on the discussion on the previous slides we conclude that the primary reason for using clusters is availability and not processing capacity. At least from an industrial perspective. However, people from academia are generally interested in clusters because they provide inexpensive massively parallel computes, i.e. clusters are popular in both industry and academia but for different reasons.
Cluster Example - Brewery If one system goes down (e.g. Manufacturing) then this task is picked up by another system (e.g. Administration) Distribution Manufacturing Administration Shared disk
Cluster Example - Office Environment If the active server fail the work will be picked up by the standby server. The standby server’s disk is consistent with the active server’s disk at all times. Standby server Active server
Cluster Example - Web-sever Some popular Internet sites need more than one server in order to handle all incoming requests. In that case one can send all requests to a dispatcher and let the dispatcher distribute the load among a number of servers. When serving the the Olympics in Nagano IBM had this kind of configuration with 53 servers. Internet Router Dispatcher Servers
Cluster Example - Our Beowolf cluster This is the system that you will use in the laboratory exercises. Front-end (king) Clients Internet
Database cluster products The database server processes are equivalent to the clients.... Disk DB server Clients
The standard reasons for using parallel and distributed systems in general n Performance (always important) n Availabilty (in most cases the most important reason for using clusters) n Price/performance (clusters consist of standard computers, which generally have good price/performance ratio) n Incremental Growth (one can incrementally extend the system by adding more computers) n Scaling (there is no upper limit on the number of computers in cluster, as opposed to the maximum number of processors in an SMP) n Scavenging (turn the idle time on organization’s computers into something useful)
Trends that promote clusters n Very high-performance microprocessors, i.e. the need for massive parallelism is decreasing n The communication technology is improving rapidly, e.g. fiber channels, Gbit networks etc. n Standard tools and protocols for distributed computing, e.g. TCP/IP n The need for high availability is increasing
Problems with cluster systems n Lack of “single system image” software. The important exception in parallel processing is the SMP. This is probably a major reason why SMPs have been relatively successful n Limited exploitation. Only a limited number of software products currently support clusters Consequently, the problem with clusters is not hardware; it is software.
The Need for High Availability There are a number of reasons why availability is becoming increasingly important u The Internet: if your site is down you will lose customers immediately u Remote accesses from employees: I.e, people working from their homes and sales personnel downloading presentations and price lists u Centralized server resources and thin clients: Reduced maintanance cost. u Etc.
Definition of a Cluster A cluster is a type of parallel system that: n Consists of a collection of interconnected whole computers, n and is used as a single unified computing resource A “whole computer” could be a uni-processor or a SMP.
Clusters versus SMPs n Clusters are composed of whole computers, SMPs are composed of processors n Compared to a SMP it is easier to obtain high availabilty in clusters n Compared to a SMP it is easier to incrementally increase the size of a cluster n SMPs are easier to maintain from a system administrators point of view n On a SMP you will often get away with only one license for your favorite software
Clusters versus distributed systems n The nodes in a distributed system have their own identity; from the outside the cluster nodes are anonymous n The computers in a distributed system often have dedicated roles, e.g. servers and clients; the computers in a cluster are usually equal. n A cluster can be one node in a distributed system
System Size Due to the rapid growth in processor speed, neither parallel nor distributed systems are particularly interesting if they cannot scale to thousands of processors/computers. Clusters on the other hand are interesting (for availability reasons) also for systems with ten or less computers.