Presentation on theme: "High Performance Cluster Computing: Architectures and Systems"— Presentation transcript:
1 High Performance Cluster Computing: Architectures and Systems Book Editor: Rajkumar BuyyaSlides: Hai Jin and Raj BuyyaInternet and Cluster Computing Center
2 Cluster Computing at a Glance Chapter 1: by M. Baker and R. Buyya IntroductionScalable Parallel Computer ArchitectureTowards Low Cost Parallel Computing and MotivationsWindows of OpportunityA Cluster Computer and its ArchitectureClusters ClassificationsCommodity Components for ClustersNetwork Service/Communications SWCluster Middleware and Single System ImageResource Management and Scheduling (RMS)Programming Environments and ToolsCluster ApplicationsRepresentative Cluster SystemsCluster of SMPs (CLUMPS)Summary and Conclusions
3 Resource Hungry Applications Solving grand challenge applications using computer modeling, simulation and analysisAerospaceInternet &EcommerceLife SciencesDigital BiologyCAD/CAMMilitary ApplicationsMilitary ApplicationsMilitary Applications
4 How to Run Applications Faster ? There are 3 ways to improve performance:Work HarderWork SmarterGet HelpComputer AnalogyUsing faster hardwareOptimized algorithms and techniques used to solve computational tasksMultiple computers to solve a particular task
5 Scalable (Parallel) Computer Architectures Taxonomybased on how processors, memory & interconnect are laid out, resources are managedMassively Parallel Processors (MPP)Symmetric Multiprocessors (SMP)Cache-Coherent Non-Uniform Memory Access (CC-NUMA)ClustersDistributed Systems – Grids/P2P
6 Scalable Parallel Computer Architectures MPPA large parallel processing system with a shared-nothing architectureConsist of several hundred nodes with a high-speed interconnection network/switchEach node consists of a main memory & one or more processorsRuns a separate copy of the OSSMP2-64 processors todayShared-everything architectureAll processors share all the global resources availableSingle copy of the OS runs on these systems
7 Scalable Parallel Computer Architectures CC-NUMAa scalable multiprocessor system having a cache-coherent nonuniform memory access architectureevery processor has a global view of all of the memoryClustersa collection of workstations / PCs that are interconnected by a high-speed networkwork as an integrated collection of resourceshave a single system image spanning all its nodesDistributed systemsconsidered conventional networks of independent computershave multiple system images as each node runs its own OSthe individual machines could be combinations of MPPs, SMPs, clusters, & individual computers
8 Key Characteristics of Scalable Parallel Computers
9 Rise and Fall of Computer Architectures Vector Computers (VC) - proprietary system:provided the breakthrough needed for the emergence of computational science, buy they were only a partial answer.Massively Parallel Processors (MPP) -proprietary systems:high cost and a low performance/price ratio.Symmetric Multiprocessors (SMP):suffers from scalabilityDistributed Systems:difficult to use and hard to extract parallel performance.Clusters - gaining popularity:High Performance Computing - Commodity SupercomputingHigh Availability Computing - Mission Critical Applications
10 The Dead Supercomputer Society http://www.paralogos.com/DeadSuper/ Dana/Ardent/StellarElxsiETA SystemsEvans & Sutherland Computer DivisionFloating Point SystemsGalaxy YH-1Goodyear Aerospace MPPGould NPLGuiltechIntel Scientific ComputersIntl. Parallel MachinesKSRMasParACRIAlliantAmerican SupercomputerAmetekApplied DynamicsAstronauticsBBNCDCConvexCray ComputerCray Research (SGI?Tera)Culler-HarrisCuller ScientificCydromeMeikoMyriasThinking MachinesSaxpyScientific Computer Systems (SCS)Soviet SupercomputersSuprenumConvex C4600
11 Computer Food Chain: Causing the demise of specialize systems Demise of mainframes, supercomputers, & MPPs
12 Towards ClustersThe promise of supercomputing to the average PC User ?3
13 Technology Trends...Performance of PC/Workstations components has almost reached performance of those used in supercomputers…Microprocessors (50% to 100% per year)Networks (Gigabit SANs);Operating Systems (Linux,...);Programming environment (MPI,…);Applications (.edu, .com, .org, .net, .shop, .bank);The rate of performance improvements of commodity systems is much rapid compared to specialized systems.
15 Towards Commodity Parallel/Cluster Computing Linking together two or more computers to jointly solve computational problemsSince the early 1990s, an increasing trend to move away from expensive and specialized proprietary parallel supercomputers towards clusters of workstationsHard to find money to buy expensive systemsThe rapid improvement in the availability of commodity high performance components for workstations and networksLow-cost commodity supercomputingFrom specialized traditional supercomputing platforms to cheaper, general purpose systems consisting of loosely coupled components built up from single or multiprocessor PCs or workstations
16 History: Clustering of Computers for Collective Computing PDAClusters19901995+2000+1980s1960
17 What is Cluster ? A cluster: A cluster is a type of parallel or distributed processing system, which consists of a collection of interconnected stand-alone computers cooperatively working together as a single, integrated computing resource.A nodea single or multiprocessor system with memory, I/O facilities, & OSA cluster:generally 2 or more computers (nodes) connected togetherin a single cabinet, or physically separated & connected via a LANappear as a single system to users and applicationsprovide a cost-effective way to gain features and benefits
19 So What’s So Different about Clusters? Commodity Parts?Communications Packaging?Incremental Scalability?Independent Failure?Intelligent Network Interfaces?Complete System on every nodevirtual memoryschedulerfiles…Nodes can be used individually or jointly...
20 Windows of Opportunities Parallel ProcessingUse multiple processors to build MPP/DSM-like systems for parallel computingNetwork RAMUse memory associated with each workstation as aggregate DRAM cacheSoftware RAIDRedundant array of inexpensive disksUse the arrays of workstation disks to provide cheap, highly available and scalable file storagePossible to provide parallel I/O support to applicationsMultipath CommunicationUse multiple networks for parallel data transfer between nodes
21 Cluster Design Issues Enhanced Performance (performance @ low cost) Enhanced Availability (failure management)Single System Image (look-and-feel of one system)Size Scalability (physical & application)Fast Communication (networks & protocols)Load Balancing (CPU, Net, Memory, Disk)Security and Encryption (clusters of clusters)Distributed Environment (Social issues)Manageability (admin. And control)Programmability (simple API if required)Applicability (cluster-aware and non-aware app.)
25 High Throughput Cluster (Idle Resource Harvesting) Shared Pool of Computing Resources: Processors, Memory, DisksInterconnectGuarantee at least one workstation to many individuals(when active)Deliver large % of collectiveresources to few individualsat any one time3
29 Prominent Components of Cluster Computers (I) Multiple High Performance ComputersPCsWorkstationsSMPs (CLUMPS)Distributed HPC Systems leading to Grid Computing
30 System CPUs Processors Intel x86 Processors Digital Alpha IBM PowerPC Pentium Pro and Pentium XeonAMD x86, Cyrix x86, etc.Digital AlphaAlpha processor integrates processing, memory controller, network interface into a single chipIBM PowerPCSun SPARCSGI MIPSHP PA
31 System Disk Disk and I/O Overall improvement in disk access time has been less than 10% per yearAmdahl’s lawSpeed-up obtained by from faster processors is limited by the slowest system componentParallel I/OCarry out I/O operations in parallel, supported by parallel file system based on hardware or software RAID
32 Commodity Components for Clusters (II): Operating Systems 2 fundamental services for usersmake the computer hardware easier to usecreate a virtual machine that differs markedly from the real machineshare hardware resources among usersProcessor - multitaskingThe new concept in OS servicessupport multiple threads of control in a process itselfparallelism within a processmultithreadingPOSIX thread interface is a standard programming environmentTrendModularity – MS Windows, IBM OS/2Microkernel – provide only essential OS serviceshigh level abstraction of OS portability
33 Prominent Components of Cluster Computers State of the art Operating SystemsLinux (MOSIX, Beowulf, and many more)Microsoft NT (Illinois HPVM, Cornell Velocity)SUN Solaris (Berkeley NOW, C-DAC PARAM)IBM AIX (IBM SP2)HP UX (Illinois - PANDA)Mach (Microkernel based OS) (CMU)Cluster Operating Systems (Solaris MC, SCO Unixware, MOSIX (academic project)OS gluing layers (Berkeley Glunix)
34 Prominent Components of Cluster Computers (III) High Performance Networks/SwitchesEthernet (10Mbps),Fast Ethernet (100Mbps),Gigabit Ethernet (1Gbps)SCI (Scalable Coherent Interface- MPI- 12µsec latency)ATM (Asynchronous Transfer Mode)Myrinet (1.28Gbps)QsNet (Quadrics Supercomputing World, 5µsec latency for MPI messages)Digital Memory ChannelFDDI (fiber distributed data interface)InfiniBand
35 Prominent Components of Cluster Computers (IV) Fast Communication Protocols and Services (User Level Communication):Active Messages (Berkeley)Fast Messages (Illinois)U-net (Cornell)XTP (Virginia)Virtual Interface Architecture (VIA)
36 Prominent Components of Cluster Computers (V) Cluster MiddlewareSingle System Image (SSI)System Availability (SA) InfrastructureHardwareDEC Memory Channel, DSM (Alewife, DASH), SMP TechniquesOperating System Kernel/Gluing LayersSolaris MC, Unixware, GLUnix, MOSIXApplications and SubsystemsApplications (system management and electronic forms)Runtime systems (software DSM, PFS etc.)Resource management and scheduling (RMS) softwareSGE (Sun Grid Engine), LSF, PBS, Libra: Economy Cluster Scheduler, NQS, etc.
37 Advanced Network Services/ Communication SW Communication infrastructure support protocol forBulk-data transportStreaming dataGroup communicationsCommunication service provide cluster with important QoS parametersLatencyBandwidthReliabilityFault-toleranceNetwork service are designed as hierarchical stack of protocols with relatively low-level communication API, provide means to implement wide range of communication methodologiesRPCDSMStream-based and message passing interface (e.g., MPI, PVM)
38 Prominent Components of Cluster Computers (VI) Parallel Programming Environments and ToolsThreads (PCs, SMPs, NOW..)POSIX ThreadsJava ThreadsMPI (Message Passing Interface)Linux, NT, on many SupercomputersPVM (Parallel Virtual Machine)Parametric ProgrammingSoftware DSMs (Shmem)CompilersC/C++/JavaParallel programming with C++ (MIT Press book)RAD (rapid application development tools)GUI based tools for PP modelingDebuggersPerformance Analysis ToolsVisualization Tools
45 Clusters Classification (V) Node ConfigurationHomogeneous ClustersAll nodes will have similar architectures and run the same OSsHeterogeneous ClustersAll nodes will have different architectures and run different OSs
46 Clusters Classification (VI) Levels of ClusteringGroup Clusters (#nodes: 2-99)Nodes are connected by SAN like MyrinetDepartmental Clusters (#nodes: 10s to 100s)Organizational Clusters (#nodes: many 100s)National Metacomputers (WAN/Internet-based)International Metacomputers (Internet-based, #nodes: 1000s to many millions)Grid ComputingWeb-based ComputingPeer-to-Peer Computing
47 [See SSI Slides of Next Lecture] Single System Image[See SSI Slides of Next Lecture]
51 Programming Environments and Tools (I) Threads (PCs, SMPs, NOW..)In multiprocessor systemsUsed to simultaneously utilize all the available processorsIn uniprocessor systemsUsed to utilize the system resources effectivelyMultithreaded applications offer quicker response to user input and run fasterPotentially portable, as there exists an IEEE standard for POSIX threads interface (pthreads)Extensively used in developing both application and system software
52 Programming Environments and Tools (II) Message Passing Systems (MPI and PVM)Allow efficient parallel programs to be written for distributed memory systems2 most popular high-level message-passing systems – PVM & MPIPVMboth an environment & a message-passing libraryMPIa message passing specification, designed to be standard for distributed memory parallel computing using explicit message passingattempt to establish a practical, portable, efficient, & flexible standard for message passinggenerally, application developers prefer MPI, as it is fast becoming the de facto standard for message passing
53 Programming Environments and Tools (III) Distributed Shared Memory (DSM) SystemsMessage-passingthe most efficient, widely used, programming paradigm on distributed memory systemcomplex & difficult to programShared memory systemsoffer a simple and general programming modelbut suffer from scalabilityDSM on distributed memory systemalternative cost-effective solutionSoftware DSMUsually built as a separate layer on top of the comm interfaceTake full advantage of the application characteristics: virtual pages, objects, & language types are units of sharingTreadMarks, LindaHardware DSMBetter performance, no burden on user & SW layers, fine granularity of sharing, extensions of the cache coherence scheme, & increased HW complexityDASH, Merlin
54 Programming Environments and Tools (IV) Parallel Debuggers and ProfilersDebuggersVery limitedHPDF (High Performance Debugging Forum) as Parallel Tools Consortium project in 1996Developed a HPD version specification, which defines the functionality, semantics, and syntax for a commercial-line parallel debuggerTotalViewA commercial product from Dolphin Interconnect SolutionsThe only widely available GUI-based parallel debugger that supports multiple HPC platformsOnly used in homogeneous environments, where each process of the parallel application being debugged must be running under the same version of the OS
55 Functionality of Parallel Debugger Managing multiple processes and multiple threads within a processDisplaying each process in its own windowDisplaying source code, stack trace, and stack frame for one or more processesDiving into objects, subroutines, and functionsSetting both source-level and machine-level breakpointsSharing breakpoints between groups of processesDefining watch and evaluation pointsDisplaying arrays and its slicesManipulating code variable and constants
56 Programming Environments and Tools (V) Performance Analysis ToolsHelp a programmer to understand the performance characteristics of an applicationAnalyze & locate parts of an application that exhibit poor performance and create program bottlenecksMajor componentsA means of inserting instrumentation calls to the performance monitoring routines into the user’s applicationsA run-time performance library that consists of a set of monitoring routinesA set of tools for processing and displaying the performance dataIssue with performance monitoring toolsIntrusiveness of the tracing calls and their impact on the application performanceInstrumentation affects the performance characteristics of the parallel application and thus provides a false view of its performance behavior
57 Performance Analysis and Visualization Tools SupportsURLAIMSInstrumentation, monitoring library, analysisMPELogging library and snapshot performance visualizationPabloMonitoring library and analysisParadynDynamic instrumentation running analysisSvPabloIntegrated instrumentor, monitoring library and analysisVampirMonitoring library performance visualizationDimenmasPerformance prediction for message passing programsParaverProgram visualization and analysis
58 Programming Environments and Tools (VI) Cluster Administration ToolsBerkeley NOWGather & store data in a relational DBUse Java applet to allow users to monitor a systemSMILE (Scalable Multicomputer Implementation using Low-cost Equipment)Called K-CAPConsist of compute nodes, a management node, & a client that can control and monitor the clusterK-CAP uses a Java applet to connect to the management node through a predefined URL address in the clusterPARMONA comprehensive environment for monitoring large clustersUse client-server techniques to provide transparent access to all nodes to be monitoredparmon-server & parmon-client
60 Cluster Applications Numerous Scientific & engineering Apps. Business Applications:E-commerce Applications (Amazon, eBay);Database Applications (Oracle on clusters).Internet Applications:ASPs (Application Service Providers);Computing Portals;E-commerce and E-business.Mission Critical Applications:command control systems, banks, nuclear reactor control, star-wars, and handling life threatening situations.
61 Some Cluster Systems: Comparison ProjectPlatformCommunicationsOSOtherBeowulfPCsMultiple Ethernet with TCP/IPLinux and GrendelMPI/PVM. Sockets and HPFBerkeley NowSolaris-based PCs and workstationsMyrinet and Active MessagesSolaris + GLUnix + xFSAM, PVM, MPI, HPF, Split-CHPVMMyrinet with Fast MessagesNT or Linux connection and global resource manager + LSFJava-fronted, FM, Sockets, Global Arrays, SHEMEM and MPISolaris MCSolaris-supportedSolaris + Globalization layerC++ and CORBA
62 Cluster of SMPs (CLUMPS) Clusters of multiprocessors (CLUMPS)To be the supercomputers of the futureMultiple SMPs with several network interfaces can be connected using high performance networks2 advantagesBenefit from the high performance, easy-to-use-and program SMP systems with a small number of CPUsClusters can be set up with moderate effort, resulting in easier administration and better support for data locality inside a node
63 Many types of Clusters High Performance Clusters Linux Cluster; 1000 nodes; parallel programs; MPILoad-leveling ClustersMove processes around to borrow cycles (eg. Mosix)Web-Service ClustersLVS/Piranah; load-level tcp connections; replicate dataStorage ClustersGFS; parallel filesystems; same view of data from each nodeDatabase ClustersOracle Parallel Server;High Availability ClustersServiceGuard, Lifekeeper, Failsafe, heartbeat, failover clusters
64 Summary: Cluster Advantage Price/performance ratio of Clusters is low when compared with a dedicated parallel supercomputer.Incremental growth that often matches with the demand patterns.The provision of a multipurpose systemScientific, commercial, Internet applicationsHave become mainstream enterprise computing systems:In 2003 List of Top 500 Supercomputers, over 50% of them are based on clusters and many of them are deployed in industries.In the recent list, most of them are clusters!
66 Commodity Components for Clusters Operating SystemsLinuxUnix-like OSRuns on cheap x86 platform, yet offers the power and flexibility of UnixReadily available on the Internet and can be downloaded without costEasy to fix bugs and improve system performanceUsers can develop or fine-tune hardware drivers which can easily be made available to other usersFeatures such as preemptive multitasking, demand-page virtual memory, multiuser, multiprocessor support
67 Commodity Components for Clusters Operating SystemsSolarisUNIX-based multithreading and multiuser OSsupport Intel x86 & SPARC-based platformsReal-time scheduling feature critical for multimedia applicationsSupport two kinds of threadsLight Weight Processes (LWPs)User level threadSupport both BSD and several non-BSD file systemCacheFSAutoClientTmpFS: uses main memory to contain a file systemProc file systemVolume file systemSupport distributed computing & is able to store & retrieve distributed informationOpenWindows allows application to be run on remote systems
68 Commodity Components for Clusters Operating SystemsMicrosoft Windows NT (New Technology)Preemptive, multitasking, multiuser, 32-bits OSObject-based security model and special file system (NTFS) that allows permissions to be set on a file and directory basisSupport multiple CPUs and provide multitasking using symmetrical multiprocessingSupport different CPUs and multiprocessor machines with threadsHave the network protocols & services integrated with the base OSseveral built-in networking protocols (IPX/SPX, TCP/IP, NetBEUI), & APIs (NetBIOS, DCE RPC, Window Sockets (Winsock))
69 Cluster Interconnects: Comparison (created in 2000) MyrinetQSnetGiganetServerNet2SCIGigabitEthernetBandwidth(MBytes/s)140 – 33MHz215 – 66 Mhz208~105165~80MPILatency (µs)16.5 – 33Nhz11 – 66 Mhz5~20.26List price/port$1.5K$6.5K~$1.5KHardwareAvailabilityNowQ2‘00Linux SupportMaximum#nodes1000’s64KProtocolImplementationFirmware on adapterFirmwareon adapterImplemented in hardwareImplementedin hardwareVIA supportSoonNoneNT/LinuxDone in hardwareSoftwareTCP/IP, VIAMPI support3rd partyQuadrics/Compaq3rd PartyCompaq/3rd partyMPICH – TCP/IP$ figures as per summer (June) 00Myrinet GM-MPI 215 MBytes/s – 11us latency - It was with a Lanai9 at 132 MHz available now.
70 Commodity Components for Clusters Cluster InterconnectsCommunicate over high-speed networks using a standard networking protocol such as TCP/IP or a low-level protocol such as AMStandard Ethernet10 Mbpscheap, easy way to provide file and printer sharingbandwidth & latency are not balanced with the computational powerEthernet, Fast Ethernet, and Gigabit EthernetFast Ethernet – 100 MbpsGigabit Ethernetpreserve Ethernet’s simplicitydeliver a very high bandwidth to aggregate multiple Fast Ethernet segments
71 Commodity Components for Clusters Cluster InterconnectsMyrinet1.28 Gbps full duplex interconnection networkUse low latency cut-through routing switches, which is able to offer fault tolerance by automatic mapping of the network configurationSupport both Linux & NTAdvantagesVery low latency (5s, one-way point-to-point)Very high throughputProgrammable on-board processor for greater flexibilityDisadvantagesExpensive: $1500 per hostComplicated scaling: switches with more than 16 ports are unavailable
Your consent to our cookies if you continue to use this website.