Presentation is loading. Please wait.

Presentation is loading. Please wait.

Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007.

Similar presentations


Presentation on theme: "Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007."— Presentation transcript:

1 Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007

2 Computational Power Today…

3 Floating Point Operations Per Second (FLOPS) Humans doing long division: Milli-flops (1/1000th of one flop) Cray-1 supercomputer, 1976, $8m: 80 MFLOPS Pentium II, 400 mhz: 100 MFLOPS TYPICAL HIGH-END PC TODAY: ~ 1 GFLOPS Sony Playstation 3, 2006: 2 TFLOPS IBM TRIPS, 2010 (one-chip solution, CPU only): 1 TFLOPS IBM Blue Gene, < 2010 (with 65,536 microprocessors): 360 TFLOPS

4 Why do we need more? "DOS addresses only 1 MB of RAM because we cannot imagine any application needing more." -- Microsoft, 1980. "640k ought to be enough for anybody"--Bill Gates, 1981. Bottom-line: Demand for computational power will continue to increase.

5 Some Computationally Intensive Applications Today Computer Aided Surgery Medical Imaging MD simulations FEM simulations with > 10^10 unknowns Galaxy formation and evolution 17 million particle Cold Dark Matter Cosmology simulation

6 Any application, which can be scaled up should be treated as a computationally intensive application.

7 The Need for Parallel Computing Memory (RAM)  There is a theoretical limit on the RAM that is available on your computer. 32 bit systems: 4GB (2^32) 64 bit systems: 16 exabytes (> 16,000 TB) Speed  Upgrading microprocessors can’t help you anymore   Flops is not the bottleneck, memory is  What we need is more registers  Think pre-computing, higher bandwidth memory bus, L2/L3 cache, compiler optimizations, assembly language  Asylum  Or…  Think parallel…

8 Hacks If Speed is not an issue…  Is out-of-core implementation an option? Parallel programs can be converted into out- of-core implementations easily.

9 Parallel Algorithms

10 The Key Questions Why?  Memory  Speed  Both What kind of platform?  Shared Memory  Distributed Computing Typical size of the application  Small (< 32 processors)  Medium ( 32 - 256 processors)  Large (> 256 processors) How much time and effort do you want to invest?  How many times will the component be used in a single execution of the program?

11 Factors to Consider in any Parallel Algorithm Design Give equal work to all processors at all times  Load Balancing Give equal amount of data to all processors  Efficient Memory Management Processors should work independently as much as possible  Minimize communication, especially iterative communication If communication is necessary, try to do some work in the background as well  Overlapping communication and computation Try to keep the sequential part of the parallel algorithm as close to the best sequential algorithm possible  Optimal Work Algorithm

12 Difference Between Sequential and Parallel Algorithms Not all data is accessible at all times All computations must be as localized as possible  Can’t have random access New dimension to the existing algorithm – division of work  Which processor does what portion of the work? If communication can not be avoided  How will it be initiated?  What type of communication?  What are the pre-processing and post-processing operations? Order of operations could be very critical for performance

13 Parallel Algorithm Approaches Data-Parallel Approach  Partition the data among the processors  Each processor will execute the same set of commands Control-Parallel Approach  Partition the tasks to be performed among the processors  Each processor will execute different commands Hybrid Approach  Switch between the two approaches at different stages of the algorithm  Most parallel algorithms fall in this category

14 Performance Metrics Speedup Overhead Scalability  Fixed Size  Iso-granular Efficiency  Speedup per processor Iso-Efficiency  Problem size as a function of p in order to keep efficiency constant

15 The Take Home Message A good parallel algorithm is NOT a simple extension of the corresponding sequential algorithm. What model to use? – Problem dependent.  e.g. a+b+c+… = (a+b) + (c+d) + …  Not much choice really. It is a big investment, but can really be worth it.

16 Parallel Programming

17 How does a parallel program work? You request a certain number of processors You setup a communicator  Give a unique id to each processor – rank Every processor executes the same program Inside the program  Query for the rank and use it decide what to do  Exchange messages between different processors using their ranks  In theory, you only need 3 functions: Isend, Irecv, wait  In practice, you can optimize communication depending on the underlying network topolgoy – Message Passing Standards…

18 Message Passing Standards The standards define a set of primitive communication operations. The vendors implementing these on any machine are responsible to optimize these operations for that machine. Popular Standards  Message Passing Interface (MPI)  Open Message Passing (OpenMP)

19 Languages that support MPI Fortran 77 C/C++ Python Matlab

20 MPI Implementations MPICH  ftp://info.mcs.anl.gov/pub/mpi ftp://info.mcs.anl.gov/pub/mpi LAM  http://www.mpi.nd.edu/lam/download CHIMP  ftp://ftp.epcc.ed.ac.uk/pub/chimp/release ftp://ftp.epcc.ed.ac.uk/pub/chimp/release WinMPI (Windows)  ftp://csftp.unomaha.edu/pub/rewini/WinMPI W32MPI (Windows)  http://dsg.dei.uc.pt/wmpi/intro.html

21 Open Source Parallel Software PETSc ( Linear and NonLinear Solvers )  http://www-unix.mcs.anl.gov/petsc/petsc-as/ http://www-unix.mcs.anl.gov/petsc/petsc-as/ ScaLAPACK ( Linear Algebra )  http://www.netlib.org/scalapack/scalapack_home.html SPRNG ( Random Number Generator )  http://sprng.cs.fsu.edu/ Paraview ( Visualization )  http://www.paraview.org/HTML/Index.html NAMD ( Molecular Dynamics )  http://www.ks.uiuc.edu/Research/namd/ CHARMM++ ( Parallel Objects )  http://charm.cs.uiuc.edu/research/charm/

22 References Parallel Programming with MPI, Peter S. Pacheco Introduction to Parallel Computing, A. Grama, A. gupta, G. Karypis, V. Kumar MPI-The Complete Reference, William Gropp et.al. http://www-unix.mcs.anl.gov/mpi/ http://www.erc.msstate.edu/mpi http://www.epm.ornl.gov/~walker/mpi http://www.erc.msstate.edu/mpi/mpi-faq.html (FAQ) http://www.erc.msstate.edu/mpi/mpi-faq.html Comp.parallel.mpi (Newsgroup) http://www.mpi-forum.org (MPI Forum) http://www.mpi-forum.org

23 Thank You


Download ppt "Today's Software For Tomorrow's Hardware: An Introduction to Parallel Computing Rahul.S. Sampath May 9 th 2007."

Similar presentations


Ads by Google