Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007.

Slides:



Advertisements
Similar presentations
CSE 160 – Lecture 9 Speed-up, Amdahl’s Law, Gustafson’s Law, efficiency, basic performance metrics.
Advertisements

Parallel Jacobi Algorithm Steven Dong Applied Mathematics.
MINJAE HWANG THAWAN KOOBURAT CS758 CLASS PROJECT FALL 2009 Extending Task-based Programming Model beyond Shared-memory Systems.
Distributed Systems CS
1 Computational models of the physical world Cortical bone Trabecular bone.
Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.
Beowulf Supercomputer System Lee, Jung won CS843.
ICS 556 Parallel Algorithms Ebrahim Malalla Office: Bldg 22, Room
Introductory Courses in High Performance Computing at Illinois David Padua.
PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Reference: Message Passing Fundamentals.
1 Distributed Computing Algorithms CSCI Distributed Computing: everything not centralized many processors.
Introduction CS 524 – High-Performance Computing.
Processor Technology and Architecture
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Parallel Computing Overview CS 524 – High-Performance Computing.
Introduction to Scientific Computing Doug Sondak Boston University Scientific Computing and Visualization.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Chapter 4 Processor Technology and Architecture. Chapter goals Describe CPU instruction and execution cycles Explain how primitive CPU instructions are.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
Lecture 37: Chapter 7: Multiprocessors Today’s topic –Introduction to multiprocessors –Parallelism in software –Memory organization –Cache coherence 1.
1 MPI-2 and Threads. 2 What are Threads? l Executing program (process) is defined by »Address space »Program Counter l Threads are multiple program counters.
Basics of Operating Systems March 4, 2001 Adapted from Operating Systems Lecture Notes, Copyright 1997 Martin C. Rinard.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
C.S. Choy95 COMPUTER ORGANIZATION Logic Design Skill to design digital components JAVA Language Skill to program a computer Computer Organization Skill.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
CC02 – Parallel Programming Using OpenMP 1 of 25 PhUSE 2011 Aniruddha Deshmukh Cytel Inc.
Lecture 4: Parallel Programming Models. Parallel Programming Models Parallel Programming Models: Data parallelism / Task parallelism Explicit parallelism.
1 CSC 1401 S1 Computer Programming I Hamid Harroud School of Science and Engineering, Akhawayn University
Performance Evaluation of Parallel Processing. Why Performance?
1 Interconnects Shared address space and message passing computers can be constructed by connecting processors and memory unit using a variety of interconnection.
1 Chapter 1 Parallel Machines and Computations (Fundamentals of Parallel Processing) Dr. Ranette Halverson.
1 Using the PETSc Parallel Software library in Developing MPP Software for Calculating Exact Cumulative Reaction Probabilities for Large Systems (M. Minkoff.
Computer Parts. Two Basic Parts Hardware & Software.
Edgar Gabriel Short Course: Advanced programming with MPI Edgar Gabriel Spring 2007.
Computers organization & Assembly Language Chapter 0 INTRODUCTION TO COMPUTING Basic Concepts.
Parallel Computer Architecture and Interconnect 1b.1.
Introduction, background, jargon Jakub Yaghob. Literature T.G.Mattson, B.A.Sanders, B.L.Massingill: Patterns for Parallel Programming, Addison- Wesley,
MATRIX MULTIPLY WITH DRYAD B649 Course Project Introduction.
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Parallelization of Classification Algorithms For Medical Imaging on a Cluster Computing System 指導教授 : 梁廷宇 老師 系所 : 碩光通一甲 姓名 : 吳秉謙 學號 :
Super computers Parallel Processing By Lecturer: Aisha Dawood.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 Basic Parallel Programming Concepts Computational.
1 CMPE 511 HIGH PERFORMANCE COMPUTING CLUSTERS Dilek Demirel İşçi.
Overcoming Scaling Challenges in Bio-molecular Simulations Abhinav Bhatelé Sameer Kumar Chao Mei James C. Phillips Gengbin Zheng Laxmikant V. Kalé.
The Cosmic Cube Charles L. Seitz Presented By: Jason D. Robey 2 APR 03.
Summary Background –Why do we need parallel processing? Moore’s law. Applications. Introduction in algorithms and applications –Methodology to develop.
Parallelization of likelihood functions for data analysis Alfio Lazzaro CERN openlab Forum on Concurrent Programming Models and Frameworks.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
© David Kirk/NVIDIA and Wen-mei W. Hwu, ECE 498AL, University of Illinois, Urbana-Champaign 1 ECE 498AL Spring 2010 Lecture 13: Basic Parallel.
Georgia Institute of Technology Speed part 1 Barb Ericson Georgia Institute of Technology May 2006.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-1.
3/12/2013Computer Engg, IIT(BHU)1 INTRODUCTION-2.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 3.
COMP7330/7336 Advanced Parallel and Distributed Computing Task Partitioning Dynamic Mapping Dr. Xiao Qin Auburn University
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Introduction to Performance Tuning Chia-heng Tu PAS Lab Summer Workshop 2009 June 30,
Department of Computer Science, Johns Hopkins University Lecture 7 Finding Concurrency EN /420 Instructor: Randal Burns 26 February 2014.
Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi
Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming
4- Performance Analysis of Parallel Programs
Parallel Programming By J. H. Wang May 2, 2017.
What is Parallel and Distributed computing?
Summary Background Introduction in algorithms and applications
Distributed Systems CS
Presentation transcript:

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007

Computational Power Today…

Floating Point Operations Per Second (FLOPS) Humans doing long division: Milli-flops (1/1000th of one flop) Cray-1 supercomputer, 1976, $8m: 80 MFLOPS Pentium II, 400 mhz: 100 MFLOPS TYPICAL HIGH-END PC TODAY: ~ 1 GFLOPS Sony Playstation 3, 2006: 2 TFLOPS IBM TRIPS, 2010 (one-chip solution, CPU only): 1 TFLOPS IBM Blue Gene, < 2010 (with 65,536 microprocessors): 360 TFLOPS

Why do we need more? "DOS addresses only 1 MB of RAM because we cannot imagine any application needing more." -- Microsoft, "640k ought to be enough for anybody"--Bill Gates, Bottom-line: Demand for computational power will continue to increase.

Some Computationally Intensive Applications Today Computer Aided Surgery Medical Imaging MD simulations FEM simulations with > 10^10 unknowns Galaxy formation and evolution 17 million particle Cold Dark Matter Cosmology simulation

Any application, which can be scaled up should be treated as a computationally intensive application.

The Need for Parallel Computing Memory (RAM)  There is a theoretical limit on the RAM that is available on your computer. 32 bit systems: 4GB (2^32) 64 bit systems: 16 exabytes (> 16,000 TB) Speed  Upgrading microprocessors can’t help you anymore   Flops is not the bottleneck, memory is  What we need is more registers  Think pre-computing, higher bandwidth memory bus, L2/L3 cache, compiler optimizations, assembly language  Asylum  Or…  Think parallel…

Hacks If Speed is not an issue…  Is out-of-core implementation an option? Parallel programs can be converted into out- of-core implementations easily.

Parallel Algorithms

The Key Questions Why?  Memory  Speed  Both What kind of platform?  Shared Memory  Distributed Computing Typical size of the application  Small (< 32 processors)  Medium ( processors)  Large (> 256 processors) How much time and effort do you want to invest?  How many times will the component be used in a single execution of the program?

Factors to Consider in any Parallel Algorithm Design Give equal work to all processors at all times  Load Balancing Give equal amount of data to all processors  Efficient Memory Management Processors should work independently as much as possible  Minimize communication, especially iterative communication If communication is necessary, try to do some work in the background as well  Overlapping communication and computation Try to keep the sequential part of the parallel algorithm as close to the best sequential algorithm possible  Optimal Work Algorithm

Difference Between Sequential and Parallel Algorithms Not all data is accessible at all times All computations must be as localized as possible  Can’t have random access New dimension to the existing algorithm – division of work  Which processor does what portion of the work? If communication can not be avoided  How will it be initiated?  What type of communication?  What are the pre-processing and post-processing operations? Order of operations could be very critical for performance

Parallel Algorithm Approaches Data-Parallel Approach  Partition the data among the processors  Each processor will execute the same set of commands Control-Parallel Approach  Partition the tasks to be performed among the processors  Each processor will execute different commands Hybrid Approach  Switch between the two approaches at different stages of the algorithm  Most parallel algorithms fall in this category

Performance Metrics Speedup Overhead Scalability  Fixed Size  Iso-granular Efficiency  Speedup per processor Iso-Efficiency  Problem size as a function of p in order to keep efficiency constant

The Take Home Message A good parallel algorithm is NOT a simple extension of the corresponding sequential algorithm. What model to use? – Problem dependent.  e.g. a+b+c+… = (a+b) + (c+d) + …  Not much choice really. It is a big investment, but can really be worth it.

Parallel Programming

How does a parallel program work? You request a certain number of processors You setup a communicator  Give a unique id to each processor – rank Every processor executes the same program Inside the program  Query for the rank and use it decide what to do  Exchange messages between different processors using their ranks  In theory, you only need 3 functions: Isend, Irecv, wait  In practice, you can optimize communication depending on the underlying network topolgoy – Message Passing Standards…

Message Passing Standards The standards define a set of primitive communication operations. The vendors implementing these on any machine are responsible to optimize these operations for that machine. Popular Standards  Message Passing Interface (MPI)  Open Message Passing (OpenMP)

Languages that support MPI Fortran 77 C/C++ Python Matlab

MPI Implementations MPICH  ftp://info.mcs.anl.gov/pub/mpi ftp://info.mcs.anl.gov/pub/mpi LAM  CHIMP  ftp://ftp.epcc.ed.ac.uk/pub/chimp/release ftp://ftp.epcc.ed.ac.uk/pub/chimp/release WinMPI (Windows)  ftp://csftp.unomaha.edu/pub/rewini/WinMPI W32MPI (Windows) 

Open Source Parallel Software PETSc ( Linear and NonLinear Solvers )  ScaLAPACK ( Linear Algebra )  SPRNG ( Random Number Generator )  Paraview ( Visualization )  NAMD ( Molecular Dynamics )  CHARMM++ ( Parallel Objects ) 

References Parallel Programming with MPI, Peter S. Pacheco Introduction to Parallel Computing, A. Grama, A. gupta, G. Karypis, V. Kumar MPI-The Complete Reference, William Gropp et.al (FAQ) Comp.parallel.mpi (Newsgroup) (MPI Forum)

Thank You