1 Introduction to Parallel Computing. 2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of.

1 Introduction to Parallel Computing

2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of HPC hardware Programming approaches (HPC software): Library-based approaches Language-based approaches HPC facilities at NIIT

3 High Performance Computing (HPC) The prime focus of HPC is performance—the ability to solve biggest possible problems in the least possible time Also called “Parallel Computing”: The use of multiple processors, used in parallel, to solve an application Normally such computing is used to solve challenging scientific problems by doing simulations: For this reason, it is also called “Scientific Computing”: Computational science HPC is a highly specialized area: Probably our best chance to work for world’s top research and commercial organizations: NASA, European Agency (ESA) … Google is known to have immense computational power—the quantity remains unknown!

4 Doing science and engineering using HPC HPC is aiding to solve some of the most important problems in science today by pushing software and hardware technology to its limits Scientific Computing (or computational science) is the field of study concerned with: Constructing mathematical models and numerical solution techniques Using computers to analyze and solve scientific and engineering problems Applications areas: Computer-aided Engineering Weather forecast simulations Animated movies (Hollywood!) Image processing Cryptography Hurricane forecasts: Path as well intensity (Katrina)

5 HPC driving science? The Millennium Simulation: Computational Astrophysics Heralded as “the” largest ever model of the Universe Follows the evolution of ten billion “dark matter” particles The simulation ran on a supercomputer for almost a month The Blue Brain Project: Computational Neuroscience An effort to simulate the working of a mammalian brain One of the fastest supercomputers in the world is used for the simulations Arguably these projects cannot be done without HPC

6 PAM CRASH—A Case Study from Automobile Industry PAM CRASH is parallel application for studying structural deformation, employed in simulations of automotive crashes and other situations: An effective alternative to physical crashes, which are expensive and time-consuming Modern simulations take into account millions of elements: Such compute-intensive simulations can only be studied on parallel hardware Automobile giants including Audi, BMW, Volkswagen and others are conducting crash simulations using PAM CRASH

9 Serial Computation Traditionally, software has been written for serial computation: To be run on a single computer having a single Central Processing Unit (CPU) A problem is broken into a discrete series of instructions

10 Parallel Computation Parallel computing is the simultaneous use of multiple compute resources to solve a computational problem: To be run using multiple CPUs A problem is broken into discrete parts that can be solved concurrently

11 Flynn’s Taxonomy There is no authoritative classification of parallel computers! Flynn’s taxonomy is one such classification based on number of instruction and data stream processed by a parallel computer: Single Instruction Single Data (SISD) Multiple Instruction Single Data (MISD) Single Instruction Multiple Data (SIMD) Multiple Instruction Multiple Data (MIMD) Almost all modern computers fall in this category

12 Flynn’s Taxonomy Extensions to Flynn’s taxonomy: Single Program Multiple Data (SPMD)—a programming model This classification is largely outdated!

14 HPC Hardware Traditionally HPC has adopted expensive parallel hardware: Massively Parallel Processors (MPP) Symmetric Multi-Processors (SMP) Cluster Computers: A group of PCs connected through a fast (private) network Other classifications: Distributed Memory Machines Shared Memory Machines

15 Massively Parallel Processors (MPP) A large parallel processing computer with a shared-nothing approach: The term signifies that each computer has its own cache and memory Examples include Cray XT3, T3E, T3D, IBM SP/2

16 Symmetric Multi-Processors (SMP) A SMP is a parallel processing system with a shared-everything approach: The term signifies that each processor shares the main memory and possibly the cache Typically a SMP can have 2 to 256 processors Examples include AMD Athlon, AMD Opteron 200 and 2000 series, Intel XEON etc

17 Cluster Computers A group of PCs or workstations or Macs (called nodes) connected to each other via a fast (and private) interconnect: Each node is an independent computer Each cluster has one head-node and multiple compute- nodes: Users logon to head-node and start parallel jobs on compute- nodes Such cluster can be made with Commodity-Off-The- Shelf (COTS) components: A major breakthrough in HPC was the adoption of commodity clusters: Economics Fast interconnects like Myrinet, Infiniband, Quadrics Two popular cluster classifications: Beowulf Clusters (http://www.beowulf.org)http://www.beowulf.org Rocks Clusters (http://www.rocksclusters.org)http://www.rocksclusters.org

18 Proc 6 Proc 0 Proc 1 Proc 3 Proc 2 Proc 4 Proc 5 Proc 7 message CPU Memory LAN Ethernet Myrinet Infiniband etc Cluster Computer

19 Beowulf History At the most fundamental level, when two or more computers are used together to solve a problem, it is considered a cluster In 1993, Donald Becker and Thomas Sterling started sketching the details of commodity-based cluster system: The aim was to come up with a cost-effective alternative to large supercomputers The initial prototype was a cluster computer consisting of 16 DX4 processors connected by channel bonded Ethernet The idea was an instant success! Largely due to economics Open-source software like Linux, GNU compilers, PVM, and MPI, were a major factor

20 Thomas Sterling with Naegling, Caltech's Beowulf Cluster

21 SMP and Multi-core clusters Most modern commodity clusters have SMP and/or multi-core nodes: Processors not only communicate via interconnect, but shared memory programming is also required This trend is likely to continue: Even a new name “constellations” has been proposed

22 Distributed Memory Each processor has its own local memory Processors communicate with each other via an interconnect

23 Shared Memory All processors have access to shared memory: Notion of “Global Address Space”

24 Hybrid Modern clusters have hybrid architecture: Distributed memory for inter-node (between nodes) communications Shared memory for intra-node (within a node) communications

25 The TOP500 The TOP500 project was started in 1993: Aim is to provide a reliable basis for tracking and detecting trends in HPC Twice a year, a list of the sites operating the 500 most powerful computer systems is assembled and released The best performance on the Linpack benchmark is used as performance measure for ranking the computer systems The latest list was released at Supercomputing 2006 held at Tampa Florida The fastest supercomputer is IBM Blue Gene/L at Lawrence Livermore National Lab (LLNL): Theoretical peak performance: 280.6 TeraFLOPS Number of Processors: 131072 Main memory: 32768 GB

27 The Top 5 1.DOE/NNSA/LLNL United States BlueGene/L - eServer Blue Gene Solution IBM 2.NNSA/Sandia National Laboratories United States Red Storm - Sandia/ Cray Red Storm, Opteron 2.4 GHz dual core Cray Inc. 3.IBM Thomas J. Watson Research Center United States BGW - eServer Blue Gene Solution IBM 4.DOE/NNSA/LLNL United States ASC Purple - eServer pSeries p5 575 1.9 GHz IBM 5.Barcelona Supercomputing Center Spain MareNostrum - BladeCenter JS21 Cluster, PPC 970, 2.3 GHz, Myrinet IBM

28 The Top 100 on Google Maps

30 Writing Parallel Software There are mainly two approaches for writing parallel software: Software that can be executed on parallel hardware to exploit computational and memory resources The first approach is to use libraries (packages) written in already existing languages like C, Fortran, and Java: Economical These libraries provide primitives (methods) like send() and recv() for communicating data The second and more radical approach is to provide new languages: HPC has a history of novel parallel languages These languages provide high level parallelism constructs: What is a construct?

31 Library-based Approach One school of thought is to provide parallelism by providing message passing between processors Such libraries are based on the idea of supporting parallelism in traditional languages like C and Fortran, Obvious social advantages Two popular messaging approaches: Parallel Virtual Machine (PVM) Message Passing Interface (MPI) Other messaging libraries: Message Passing Toolkit (MPT) SHared MEMory (SHMEM) … The Message Passing Interface (MPI) has become a de facto standard for writing HPC applications

32 Message Passing Interface (MPI) MPI is a standard (an interface or an API): It defines a set of methods that are used by application developers to write their applications MPI library implement these methods MPI itself is not a library—it is a specification document that is followed! Reasons for popularity: Software and hardware vendors were involved Significant contribution from academia MPICH served as an early reference implementation MPI compilers are simply wrappers to widely used C and Fortran compilers MPI is a success story: It is the mostly adopted programming paradigm of IBM Blue Gene systems At least two production-quality MPI libraries: MPICH2 (http://www-unix.mcs.anl.gov/mpi/mpich2/)http://www-unix.mcs.anl.gov/mpi/mpich2/ OpenMPI (http://open-mpi.org)http://open-mpi.org There’s even a Java library: MPJ Express (http://mpj-express.org)http://mpj-express.org

33 Language-based Approach There is a long history of novel parallel programming languages: The central idea is to support parallelism by providing easy-to- use constructs Social aspects to HPC languages: Dialect or superset of existing languages Completely new HPC languages - an ambitious approach What happens to legacy code? Conceptually most HPC languages can be categorized as: Shared memory languages: Mainly for programming on shared memory platforms like SMP Partitioned Global Address Space (PGAS) languages: Mainly for distributed memory HPC platforms Distributed memory languages: Mainly for distributed memory HPC platforms

34 Shared Memory Languages Designed to support parallel programming on shared memory platforms: OpenMP: Consists of a set of compiler directives, library routines, and environment variables The runtime uses fork-join model of parallel execution Cilk: A design goal was to support asynchronous parallelism A set of keywords: cilk, spawn, sync … POSIX Threads (PThreads)

35 Partitioned Global Address Space (PGAS) Languages A PGAS is an abstraction that logically divide a process’ address space into two halves: Private Shared Follow the so-called Distributed Shared Memory (DSM) model Unified Parallel C (UPC): We discuss it in detail later Titanium: A Java dialect Co-Array Fortran: Support for co-arrays

36 Distributed Memory Languages These purely DM languages support HPC on distributed memory platforms High Performance Fortran (HPF): Data parallelism An effort to standardize a family of data parallel Fortran languages Fortran M: Ensured deterministic execution Added message passing extensions to Fortran 77 HPJava: Motivated by HPF

37 MPI SHMEM Languages based on Global Address Space Languages based on Directives Languages based on Library CFortranJava UPCCoArray FortranTitanium HPF OpenMP GPMEM PVM X10 Languages driven by HPCS Fortress Chapel libraries Language extension A Different Aspect Runtime level Credit: Hong Ong, Oak Ridge National Laboratory

38 US High Productivity Computing Systems Aims: To produce systems that double in productivity and value every 18 months Decrease time-to-solution: Development time Execution time Research: In SW and HW technology: New Programming Languages Quantifying productivity Funding stages: Three vendors are involved: Sun, IBM, and Cray Three new programming languages: X10, Chapel, and Fortress

1 Introduction to Parallel Computing. 2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of.

Similar presentations

Presentation on theme: "1 Introduction to Parallel Computing. 2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Introduction to Parallel Computing. 2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of.

Similar presentations

Presentation on theme: "1 Introduction to Parallel Computing. 2 Presentation Outline Doing science and engineering using HPC Basic concepts of parallel computing Discussion of."— Presentation transcript:

Similar presentations

About project

Feedback