Presentation is loading. Please wait.

Presentation is loading. Please wait.

COMPE 462 Parallel Computing

Similar presentations


Presentation on theme: "COMPE 462 Parallel Computing"— Presentation transcript:

1 COMPE 462 Parallel Computing
Ali Yazıcı – Ziya Karakaya Fall Atılım University COMPE472 Parallel Computing

2 Chapter 1 Parallel Programming COMPE472 Parallel Computing

3 Why Use Parallel Computing?
Main Reasons: Save time and/or money: In theory, throwing more resources at a task will shorten its time to completion, with potential cost savings. Parallel clusters can be built from cheap, commodity components.     Solve larger problems: Many problems are so large and/or complex that it is impractical or impossible to solve them on a single computer, especially given limited computer memory. For example: "Grand Challenge" (en.wikipedia.org/wiki/Grand_Challenge) problems requiring PetaFLOPS and PetaBytes of computing resources. Web search engines/databases processing millions of transactions per second COMPE472 Parallel Computing

4 Demand for Computational Speed
Continual demand for greater computational speed from a computer system than is currently possible Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period. COMPE472 Parallel Computing

5 Grand Challenge Problems
One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable. Examples Databases, data mining Oil exploration Web search engines, web based business services Medical imaging and diagnosis Pharmaceutical design Management of national and multi-national corporations Financial and economic modeling Advanced graphics and virtual reality, particularly in the entertainment industry Networked video and multi-media technologies Collaborative work environments Modeling large DNA structures Global weather forecasting Modeling motion of astronomical bodies. COMPE472 Parallel Computing

6 Weather Forecasting Atmosphere modeled by dividing it into 3-dimensional cells. Calculations of each cell repeated many times to model passage of time. COMPE472 Parallel Computing

7 Global Weather Forecasting Example
Suppose whole global atmosphere divided into cells of size 1 mile  1 mile  1 mile to a height of 10 miles (10 cells high) - about 5  108 cells. Suppose each calculation requires 200 floating point operations. In one time step, 1011 floating point operations necessary. To forecast the weather over 7 days using 1-minute intervals, a computer operating at 1Gflops (109 floating point operations/s) takes 106 seconds or over 10 days. To perform calculation in 5 minutes requires computer operating at 3.4 Tflops (3.4  1012 floating point operations/sec). COMPE472 Parallel Computing

8 Modeling Motion of Astronomical Bodies
Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body. With N bodies, N - 1 forces to calculate for each body, or approx. N2 calculations. (N log2 N for an efficient approx. algorithm.) After determining new positions of bodies, calculations repeated. COMPE472 Parallel Computing

9 A galaxy might have, say, 1011 stars.
Even if each calculation done in 1 ms (extremely optimistic figure), it takes 109 years for one iteration using N2 algorithm and almost a year for one iteration using an efficient N log2 N approximate algorithm. COMPE472 Parallel Computing

10 Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).
COMPE472 Parallel Computing

11 Parallel Computing Motives
Using more than one computer, or a computer with more than one processor, to solve a problem. Motives Usually faster computation - very simple idea - that n computers operating simultaneously can achieve the result n times faster - it will not be n times faster for various reasons. Other motives include: fault tolerance, larger amount of memory available, ... COMPE472 Parallel Computing

12 The Real World is Massively Parallel
COMPE472 Parallel Computing

13 Parallel Computing vs Traditional Computing
COMPE472 Parallel Computing

14 Background Parallel computers - computers with more than one processor - and their programming - parallel programming - has been around for more than 40 years. COMPE472 Parallel Computing

15 Gill writes in 1958: “... There is therefore nothing new in the idea of parallel programming, but its application to computers. The author cannot believe that there will be any insuperable difficulty in extending it to computers. It is not to be expected that the necessary programming techniques will be worked out overnight. Much experimenting remains to be done. After all, the techniques that are commonly used in programming today were only won at the cost of considerable toil several years ago. In fact the advent of parallel programming may do something to revive the pioneering spirit in programming which seems at the present to be degenerating into a rather dull and routine occupation ...” Gill, S. (1958), “Parallel Programming,” The Computer Journal, vol. 1, April, pp COMPE472 Parallel Computing

16 COMPE472 Parallel Computing

17 Speedup Factor ts S(p) = = tp
Execution time using one processor (best sequential algorithm) S(p) = = tp Execution time using a multiprocessor with p processors where ts is execution time on a single processor and tp is execution time on a multiprocessor. S(p) gives increase in speed by using multiprocessor. Use best sequential algorithm with single processor system. Underlying algorithm for parallel implementation might be (and is usually) different. COMPE472 Parallel Computing

18 Speedup factor can also be cast in terms of computational steps:
Can also extend time complexity to parallel computations. Number of computational steps using one processor S(p) = Number of parallel computational steps with p processors COMPE472 Parallel Computing

19 Maximum Speedup Maximum speedup is usually p with p processors (linear speedup). Possible to get superlinear speedup (greater than p) but usually a specific reason such as: Extra memory in multiprocessor system Nondeterministic algorithm COMPE472 Parallel Computing

20 Maximum Speedup Factors limiting speedup Communication time
Extra computations in the parallel algorithm (reevaluation of constants locally) Idle time of some processors COMPE472 Parallel Computing

21 Maximum Speedup Amdahl’s law
t s ft (1 - f ) t s s Serial section Parallelizable sections (a) One processor (b) Multiple processors p processors COMPE472 Parallel Computing (1 - f ) t / p t s p

22 Amdahl’s Law Speedup factor is given by:
This equation is known as Amdahl’s law COMPE472 Parallel Computing

23 Speedup against number of processors
= 0% 20 16 12 f = 5% 8 f = 10% 4 f = 20% 4 8 12 16 20 Number of processors , p COMPE472 Parallel Computing

24 Amdahl’s law Even with infinite number of processors, maximum speedup is limited to 1/f. Example With only 5% of computation being serial, maximum speedup is 20, irrespective of number of processors. COMPE472 Parallel Computing

25 Superlinear Speedup example - Searching
(a) Searching each sub-space sequentially Start Time t s t /p s Sub-space D t search x t /p s Solution found x indeterminate COMPE472 Parallel Computing

26 (b) Searching each sub-space in parallel
D t COMPE472 Parallel Computing Solution found

27 t s x ´ + D t p S(p) = D t Speed-up then given by
COMPE472 Parallel Computing

28 p – 1 ´ t + D t s p S(p) = ® ¥ D t as D t tends to zero
Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e. p 1 t + D t s p S(p) = D t as D t tends to zero    COMPE472 Parallel Computing

29 Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e. Actual speed-up depends upon which subspace holds solution but could be extremely large. D t S(p) = = 1 D t COMPE472 Parallel Computing

30 Scalability Architecturally scalable system
Increase in number of processors leading to increase in speedup Architectural/Algorithmic scalability Increase in data size can be accomodated by the increase in number of processors Gustafson argued that Amdahl’s law is not significant Parallel execution time is fixed As p increases, problem size is also increased to maintain constant parallel execution time COMPE472 Parallel Computing

31 Message-Passing Computations
In a message passing environment, computation time consists of two parts: The ratio below can be used as a metric: COMPE472 Parallel Computing

32 Types of Parallel Computers
Two principal types: Shared memory multiprocessor Distributed memory multicomputer COMPE472 Parallel Computing

33 Shared Memory Multiprocessor
COMPE472 Parallel Computing

34 Conventional Computer
Consists of a processor executing a program stored in a (main) memory: Each main memory location located by its address. Addresses start at 0 and extend to 2b - 1 when there are b bits (binary digits) in address. Main memory Instr uctions (to processor) Data (to or from processor) Processor COMPE472 Parallel Computing

35 Shared Memory Multiprocessor System
Natural way to extend single processor model - have multiple processors connected to multiple memory modules, such that each processor can access any memory module : Memory module One address space Interconnection network Processors COMPE472 Parallel Computing

36 Simplistic view of a small shared memory multiprocessor
Processors Shared memory Bus Examples: Dual Pentiums Quad Pentiums COMPE472 Parallel Computing

37 Quad Pentium Shared Memory Multiprocessor
L1 cache L1 cache L1 cache L1 cache L2 Cache L2 Cache L2 Cache L2 Cache Bus interface Bus interface Bus interface Bus interface Processor/ memory b us I/O interf ace Memory controller I/O b us COMPE472 Parallel Computing Memory Shared memory

38 Programming Shared Memory Multiprocessors
Threads - programmer decomposes program into individual parallel sequences, (threads), each being able to access variables declared outside threads. Example Pthreads Sequential programming language with preprocessor compiler directives to declare shared variables and specify parallelism. Example OpenMP - industry standard - needs OpenMP compiler COMPE472 Parallel Computing

39 Example UPC (Unified Parallel C) - needs a UPC compiler.
Sequential programming language with added syntax to declare shared variables and specify parallelism. Example UPC (Unified Parallel C) - needs a UPC compiler. Parallel programming language with syntax to express parallelism - compiler creates executable code for each processor (not now common) Sequential programming language and ask parallelizing compiler to convert it into parallel executable code. - also not now common COMPE472 Parallel Computing

40 Message-Passing Multicomputer
Complete computers connected through an interconnection network: Interconnection network Messages Processor Local memory Computers COMPE472 Parallel Computing

41 Interconnection Networks
Limited and exhaustive interconnections 2- and 3-dimensional meshes Hypercube (not now common) Using Switches: Crossbar Trees Multistage interconnection networks Peer-to-peer c(c-1)/2 links with c processors COMPE472 Parallel Computing

42 Two-dimensional array (mesh)
Computer/ processor Links COMPE472 Parallel Computing

43 Three-dimensional hypercube
In a d-dim hypercube, each node connects to one node in each dimension. Above a 3-d hypercube is shown. Each node is assigned a 3 bit address. Address difference between nodes is only 1 bit. Diameter is log2p COMPE472 Parallel Computing

44 Four-dimensional hypercube
Hypercubes popular in 1980’s - not now COMPE472 Parallel Computing

45 Crossbar switch Memor ies Processors Switches Provides exhaustive connections using one switch for each connection. Used in shared memory systems. COMPE472 Parallel Computing

46 Tree Thinking Machine’s Connection Machine CM5 (1993) Root Switch
Links element Processors Thinking Machine’s Connection Machine CM5 (1993) COMPE472 Parallel Computing

47 Multistage Interconnection Network Example: Omega network
2 2 switch elements (straight-through or crossover connections) 000 000 001 001 010 010 011 011 Inputs Outputs 100 100 101 101 110 110 111 111 COMPE472 Parallel Computing

48 Communication Methods
Circuit switching Establish the path Maintain/Reserve links for message passing Simple telephone system is an example Used in early multicomputers (INTEL IPSC-2) Packet switching Divide message into “packets” Packet = Source/Dest addresses + Data Packet max size is known Mail system is an example COMPE472 Parallel Computing

49 Distributed Shared Memory
Making main memory of group of interconnected computers look as though a single memory with single address space. Then can use shared memory programming techniques. Interconnection netw or k Messages Processor Shared memory Computers COMPE472 Parallel Computing

50 Flynn’s Classifications
Flynn (1966) created a classification for computers based upon instruction streams and data streams: Single instruction stream-single data stream (SISD) computer Single processor computer - single stream of instructions generated from program. Instructions operate upon a single stream of data items. COMPE472 Parallel Computing

51 Single Instruction, Single Data (SISD):
A serial (non-parallel) computer Single instruction: only one instruction stream is being acted on by the CPU during any one clock cycle Single data: only one data stream is being used as input during any one clock cycle Deterministic execution This is the oldest and until recently, the most prevalent form of computer Examples: most PCs, single CPU workstations and mainframes COMPE472 Parallel Computing

52 SISD COMPE472 Parallel Computing

53 Multiple Instruction Stream-Multiple Data Stream (MIMD) Computer
General-purpose multiprocessor system - each processor has a separate program and one instruction stream is generated from each program for each processor. Each instruction operates upon different data. Both the shared memory and the message-passing multiprocessors so far described are in the MIMD classification. COMPE472 Parallel Computing

54 MIMD Multiple Instruction, Multiple Data (MIMD):
Currently, the most common type of parallel computer. Most modern computers fall into this category. Multiple Instruction: every processor may be executing a different instruction stream Multiple Data: every processor may be working with a different data stream Execution can be synchronous or asynchronous, deterministic or non-deterministic Examples: most current supercomputers, networked parallel computer "grids" and multi-processor SMP computers - including some types of PCs. COMPE472 Parallel Computing

55 MIMD COMPE472 Parallel Computing

56 Single Instruction Stream-Multiple Data Stream (SIMD) Computer
A specially designed computer - a single instruction stream from a single program, but multiple data streams exist. Instructions from program broadcast to more than one processor. Each processor executes same instruction in synchronism, but using different data. Developed because a number of important applications that mostly operate upon arrays of data. COMPE472 Parallel Computing

57 SIMD Single Instruction, Multiple Data (SIMD):
A type of parallel computer Single instruction: All processing units execute the same instruction at any given clock cycle Multiple data: Each processing unit can operate on a different data element This type of machine typically has an instruction dispatcher, a very high-bandwidth internal network, and a very large array of very small-capacity instruction units. Best suited for specialized problems characterized by a high degree of regularity,such as image processing. Synchronous (lockstep) and deterministic execution Two varieties: Processor Arrays and Vector Pipelines Examples: Processor Arrays: Connection Machine CM-2, Maspar MP-1, MP-2 Vector Pipelines: IBM 9000, Cray C90, Fujitsu VP, NEC SX-2, Hitachi S820 COMPE472 Parallel Computing

58 SIMD COMPE472 Parallel Computing

59 Multiple Program Multiple Data (MPMD) Structure
Within the MIMD classification, each processor will have its own program to execute: Program Program Instructions Instructions Processor Processor Data Data COMPE472 Parallel Computing

60 Single Program Multiple Data (SPMD) Structure
Single source program written and each processor executes its personal copy of this program, although independently and not in synchronism. Source program can be constructed so that parts of the program are executed by certain computers and not others depending upon the identity of the computer. COMPE472 Parallel Computing

61 Networked Computers as a Computing Platform
A network of computers became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in early 1990’s. Several early projects. Notable: Berkeley NOW (network of workstations) project. NASA Beowulf project. COMPE472 Parallel Computing

62 Key advantages: Very high performance workstations and PCs readily available at low cost. The latest processors can easily be incorporated into the system as they become available. Existing software can be used or modified. COMPE472 Parallel Computing

63 Software Tools for Clusters
Based upon Message Passing Parallel Programming: Parallel Virtual Machine (PVM) - developed in late 1980’s. Became very popular. Message-Passing Interface (MPI) - standard defined in 1990s. Both provide a set of user-level libraries for message passing. Use with regular programming languages (C, C++, ...). COMPE472 Parallel Computing

64 Beowulf Clusters* A group of interconnected “commodity” computers achieving high performance with low cost. Typically using commodity interconnects - high speed Ethernet, and Linux OS. * Beowulf comes from name given by NASA Goddard Space Flight Center cluster project. COMPE472 Parallel Computing

65 Cluster Interconnects
Originally fast Ethernet on low cost clusters Gigabit Ethernet - easy upgrade path More Specialized/Higher Performance Myrinet Gbits/sec - disadvantage: single vendor cLan SCI (Scalable Coherent Interface) QNet Infiniband - may be important as infininband interfaces may be integrated on next generation PCs COMPE472 Parallel Computing

66 Dedicated cluster with a master node
User Compute nodes Master node Up link 2nd Ether net Exter nal netw or k Switch interf ace COMPE472 Parallel Computing


Download ppt "COMPE 462 Parallel Computing"

Similar presentations


Ads by Google