1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.

Slides:



Advertisements
Similar presentations
Distributed Processing, Client/Server and Clusters
Advertisements

2. Computer Clusters for Scalable Parallel Computing
Today’s topics Single processors and the Memory Hierarchy
1 Parallel Scientific Computing: Algorithms and Tools Lecture #3 APMA 2821A, Spring 2008 Instructors: George Em Karniadakis Leopold Grinberg.
Dinker Batra CLUSTERING Categories of Clusters. Dinker Batra Introduction A computer cluster is a group of linked computers, working together closely.
History of Distributed Systems Joseph Cordina
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Chapter 1 Introduction 1.1A Brief Overview - Parallel Databases and Grid Databases 1.2Parallel Query Processing: Motivations 1.3Parallel Query Processing:
Chapter 17 Parallel Processing.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
07/14/08. 2 Points Introduction. Cluster and Supercomputers. Cluster Types and Advantages. Our Cluster. Cluster Performance. Cluster Computer for Basic.
Chapter 2 Computer Clusters Lecture 2.1 Overview.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
CLUSTER COMPUTING Prepared by: Kalpesh Sindha (ITSNS)
Shilpa Seth.  Centralized System Centralized System  Client Server System Client Server System  Parallel System Parallel System.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
1 Parallel computing and its recent topics. 2 Outline 1. Introduction of parallel processing (1)What is parallel processing (2)Classification of parallel.
KUAS.EE Parallel Computing at a Glance. KUAS.EE History Parallel Computing.
Computer System Architectures Computer System Software
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
Designing and Evaluating Parallel Programs Anda Iamnitchi Federated Distributed Systems Fall 2006 Textbook (on line): Designing and Building Parallel Programs.
 Introduction to Operating System Introduction to Operating System  Types Of An Operating System Types Of An Operating System  Single User Single User.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
High Performance Computing Presented To Mam Saman Iftikhar Presented BY Siara Nosheen MSCS 2 nd sem 2514.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Department of Computer Science University of the West Indies.
Multiprocessing. Going Multi-core Helps Energy Efficiency William Holt, HOT Chips 2005 Adapted from UC Berkeley "The Beauty and Joy of Computing"
April 26, CSE8380 Parallel and Distributed Processing Presentation Hong Yue Department of Computer Science & Engineering Southern Methodist University.
Intro – Part 2 Introduction to Database Management: Ch 1 & 2.
DISTRIBUTED COMPUTING Introduction Dr. Yingwu Zhu.
Distributed Computing Systems CSCI 4780/6780. Geographical Scalability Challenges Synchronous communication –Waiting for a reply does not scale well!!
PARALLEL COMPUTING overview What is Parallel Computing? Traditionally, software has been written for serial computation: To be run on a single computer.
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
Orange Coast College Business Division Computer Science Department CS 116- Computer Architecture Multiprocessors.
Chapter 2 Introduction to Systems Architecture. Chapter goals Discuss the development of automated computing Describe the general capabilities of a computer.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
CS- 492 : Distributed system & Parallel Processing Lecture 7: Sun: 15/5/1435 Foundations of designing parallel algorithms and shared memory models Lecturer/
Introduction to the new mainframe © Copyright IBM Corp., All rights reserved. 1 Main Frame Computing Objectives Explain why data resides on mainframe.
+ Clusters Alternative to SMP as an approach to providing high performance and high availability Particularly attractive for server applications Defined.
Server HW CSIS 4490 n-Tier Client/Server Dr. Hoganson Server Hardware Mission-critical –High reliability –redundancy Massive storage (disk) –RAID for redundancy.
Outline Why this subject? What is High Performance Computing?
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
Parallel Processing Presented by: Wanki Ho CS147, Section 1.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-2.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
CDA-5155 Computer Architecture Principles Fall 2000 Multiprocessor Architectures.
Background Computer System Architectures Computer System Software.
Computer Science and Engineering Parallel and Distributed Processing CSE 8380 April 28, 2005 Session 29.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
1  2004 Morgan Kaufmann Publishers Fallacies and Pitfalls Fallacy: the rated mean time to failure of disks is 1,200,000 hours, so disks practically never.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Flynn’s Taxonomy Many attempts have been made to come up with a way to categorize computer architectures. Flynn’s Taxonomy has been the most enduring of.
Network Operating Systems (NOS)
CLUSTER COMPUTING Presented By, Navaneeth.C.Mouly 1AY05IS037
Grid Computing.
Multi-Processing in High Performance Computer Architecture:
Chapter 17: Database System Architectures
Different Architectures
CLUSTER COMPUTING.
CSE8380 Parallel and Distributed Processing Presentation
Chapter 4 Multiprocessors
Database System Architectures
Presentation transcript:

1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi

2 Motivation Why high amount of computation is needed?   Genetic engineering: Searching for matching DNA patterns in large DNA banks.   Cosmology: Simulations on very complex systems, such as simulating the formation of a galaxy Climate: Solving very high precision floating point calculations, simulating chaotic systems. Financial modeling and commerce: Simulating chaotic systems, like climate modeling problem. Cryptography: Searching very large state spaces, to find out the cryptographic key; factoring very large numbers.   Software: Searching large state spaces for evaluating and verifying the software.

3  Ways to improve performance?   Work harder   Work smarter   Get help   Hardware improvements   Better algorithms   Parallelism

4 Moore’s Law  Moore’s Law:  In 18 months, processing capacity doubles.  Performance improvements of high performance computing??  Higher than expected by Moore’s Law.  Due to parallelism?

5 Types of Parallel Architectures

6  Single Instruction Single Data (SISD)  Single Instruction Multiple Data (SIMD):  Multiple Instruction Single Data (MISD):  Multiple Instruction Multiple Data (MIMD):

7 Single Instruction Single Data (SISD)  No parallelism  Simple single processor

8  Single Instruction Multiple Data (SIMD):  Single instruction is executed by multiple processors on different data streams.  Instruction memory is single.  There are multiple data memory units.  Vector architectures are of SIMD type.

9  Multiple Instruction Single Data (MISD):  Not commercially built.  Refers to the structure where a single data stream is operated by different functional units.

10  Multiple Instruction Multiple Data (MIMD):  Each processor has its own instruction and data memory. This is the type we are interested in.

11 High Performance Computing Techniques  Supercomputers  Clusters  Grid Systems Custom build Shared memory processing (SMP) Not-parallelizable problems Optimized processors Use parallelism Consists of more than one computers Distributed memory processing “Internet is the computer” No geographical limitations

12  Main idea in cluster architectures  The old idea of parallel computing  (physical clustering of general purpose hardware and message passing of distributed computing)  New low cost technology (mass market COTS pc and networking products).

13 What is a Cluster???

14  Many definitions:  A cluster is two or more independent computers that are connected by a dedicated network to perform a joint task.  A cluster is a group of servers that coordinate their actions to provide scalable, high available services.

15 A cluster is a type of parallel and distributed processing system, which consists of a collection of interconnected stand-alone computers working together as a single, integrated computing resource.

16 Basic cluster  Multiple computing nodes,  low cost  a fully functioning computer with its own memory, CPU, possibly storage  own instance of operating system  computing nodes are connected by interconnects  typically low cost, high bandwidth and low latency  permanent, high performance data storage  a resource manager to distribute and schedule jobs  the middleware that allows the computers act as a distributed or parallel system  parallel applications designed to run on it

17 A cluster architecture   High speed interconnect   Computing nodes Myricom—1.28 Gbps in each direction IEEE SCI latency under 2.5 microseconds, 3.2 Gbps each direction (ring or torus topology) Ethernet-star topology In most cases limitation is the server’s internal PCI bus system. Cluster Middleware To support Single System Image (SSI) Resource management and scheduling software -Initial installation -Administration -Scheduling -Allocation of hardware -Allocation software components Parallel programming environments and tools Compilers Parallel Virtual Machine (PVM) Message Passing Interface (MPI) Parallel and sequential applications   Master nodes

18 Types of Clusters

19  High performance  High availability High computing capability Consider the fail possibility of each hardware of software Includes redundancy A subset of this type is the load balancing clusters Typically used for business applications—web servers

20  Homogeneous clusters:  Heterogeneous clusters: In homogeneous clusters all nodes have similar properties. Each node is much like any other. Amount of memory and interconnects are similar.In homogeneous clusters all nodes have similar properties. Each node is much like any other. Amount of memory and interconnects are similar.  Nodes have different characteristics, in the sense of memory and interconnect performance.

21  Single-tier clusters:  Multi-tier clusters: There is no hierarchy of nodes is defined. Any node may be used for any purpose. The main advantage of the single tier cluster is its simplicity. The main disadvantage is its limit to be expanded.There is no hierarchy of nodes is defined. Any node may be used for any purpose. The main advantage of the single tier cluster is its simplicity. The main disadvantage is its limit to be expanded.  There is a hierarchy between nodes. There are node sets, where each set has a specialized function

22 Clusters in the Flynn’s Taxanomy

23  Multiple Instruction Multiple Data (MIMD)  Distributed Memory Processing (DMP) Each of the nodes has its own instruction memory and data memory.Each of the nodes has its own instruction memory and data memory. Programs can not directly access the memory of remote systems in the cluster. They have to use a kind of message passing between nodes.Programs can not directly access the memory of remote systems in the cluster. They have to use a kind of message passing between nodes.

24 Benefits of Clusters

25  Ease of building:  No expensive and long development projects.  Price performance benefit:  Highly available COTS products are used.  Flexibility of configuration:  Number of nodes, nodes’ performance, inter- connection topology can be upgraded. System can be modified without loss of prior work. Scale up: Increasing the throughput of each computing node.Scale up: Increasing the throughput of each computing node. Scale out: Increase the number of computing nodes. Requires efficient i/o between nodes and cost effective management of large number of nodes.

26 Efficiency of a Cluster

27  Cluster throughout is a function of the following  CPUs: Total number and speed of cpus  Efficiency of the parallel algorithms  Inter-Process Communication: Efficiency of the inter-process communication between the computing nodes  Storage I/O: Frequency and size of input data reads and output data writes  Job Scheduling: Efficiency of the scheduling

28 Top 500 HPCs YearPercentage# Processors# Max. Linpack (GF) Linpack is a measure of a processor’s floating point execution.Linpack is a measure of a processor’s floating point execution.

29 BlueGene/L: An Example Cluster-Base Computer  By January 2005, was ranked as the world’s most powerful computer,  in terms of Linpack performance.  Developed by IBM and US Department of Energy.  A heterogeneous cluster,  dedicated nodes to specific functions.  A cluster of nodes.  Computing nodes have two processors,  resulting in more than processors total.  Each processor has a dual floating point unit.  It includes host nodes for management purposes.  1024 of the nodes are I/O nodes to the outside world.  I/O nodes run Linux.  The compute nodes run a simple specialized OS.  Uses message passing,  in an interconnection network of tree structure.  Each computing node has a 2 Gb RAM,  which is shared between the two processors of the node.

30 Ongoing Research on Clusters

31  Management of large clusters  Request distributions,  Optimizing load balance  Health monitoring of the cluster  Connected clusters using Grid technology  Nonlinearity of scaling  In the ideal case, n number of CPUs should perform n times better than a single CPU. However, performance gain does not increase linearly.  The main challenge : Developing parallel algorithms  To minimize inter-nodal communications Program development for parallel architectures is a difficult problem because of two reasons:  Describing the applications concurrency and data dependencies.  Exploiting the processing resources of the architecture in order to obtain an efficient implementation for a specific hardware.

32 A Multi-tier Cluster Architecture Input traffic arrives at one ore front end load balancersInput traffic arrives at one ore front end load balancers Static content is served.Static content is served. Application servers serve the dynamic content. The application servers are responsible for financial transaction functions such as the order entry, catalog search, etc.Application servers serve the dynamic content. The application servers are responsible for financial transaction functions such as the order entry, catalog search, etc. The third tier may consist of multiple database servers, which are specialized for different data sets.The third tier may consist of multiple database servers, which are specialized for different data sets. Load balancing between tiers.Load balancing between tiers.

33  To sum up…  Highly promising for HPC  Cheap  Easy to obtain and develop  Applicable for many diverse applications  Not the answer for all questions  Not applicable for non-parallelizable applications

34  Thanks for listening…  Questions?