Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari.

Similar presentations


Presentation on theme: "Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari."— Presentation transcript:

1 Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari

2 Parallel Processing Super Computer Parallel Computer Amdahl’s Low, Speedup, Efficiency Parallel Machine Architecture Computational Model Concurrency Approach Parallel Programming Cluster Computing Lecture organization

3 It is the division of work into smaller tasks Assigning many smaller tasks to multiple workers to work on simultaneously Parallel processing is the use of multiple processors to execute different parts of the same program simultaneously Difficulties: coordinating, controlling and monitoring the workers The main goals of parallel processing are: -solve much bigger problems much faster! to reduce wall-clock time of execution of computer programs to increase the size of computational problems that can be solved What is Parallel Processing?

4 What is a Supercomputer? A supercomputer is a computer that is a lot faster than the computers that normal people use Note: This is a time-dependent definition Manufacturer Computer/Procs R max R peak Installation Site Country/Year TMC CM-5/1024/ 1024 59.70 131.00 Los Alamos National Laboratory USA/ June 1993: TOP500 Lists Supercomputer & parallel computer

5 June 2003: Manufacturer Computer/Procs R max R peak Installation Site Country/Year NEC Earth-Simulator/ 5120 35860.00 40960.00 Earth simulator center Japan R max Maximal LINPACK performance achieved R peak Theoretical peak performance LINPACK is a Benchmark

6

7 Amdahl’s Law Amdahl’s low, Speedup, Efficiency

8

9 Efficiency is a measure of the fraction of time that a processor spends performing useful work. Efficiency

10 Shunt Operation

11 SIMD MIMD MISD Clusters Parallel and Distributed Computers

12 SIMD (Single Instruction Multiple Data)

13 MISD(Multi Instruction Single Data)

14 MIMD (Multiple Instruction Multiple Data)

15 MIMD(cont.)

16 Shared memory model Bus-based Switch-based NUMA Distributed memory model Distributed shared memory model Page-based Object-based Hardware Parallel machine architecture

17 Shared memory model

18 - Shared memory or Multiprocessor -OpenMP is a standard (C/C++/FORTRAN) Advantage: Easy Programming. Disadvantage: Design Complexity Not Scalable Shared memory model(cont.)

19 -Bus is bottleneck - Not scalable Bus-based shared memory model

20 - Maintenance is difficult. - Expensive - scalable Switch-based shared memory model

21 NUMA stands for Non-Uniform Memory Access. Simulated shared memory Better scalability NUMA model

22 Multi computer MPI(Message Passing Interface) Easy design Low cost High scalability Difficult programming Distributed memory model

23 Linear Array Ring Mesh Fully Connected 63 12 54 Examples of Network Topology

24 1110 1111 1010 1011 0110 0111 0010 0011 1101 1010 1000 1001 0100 0101 0010 0000 0001 S d = 4 Hypercubes Examples of Network Topology(cont.)

25 Simpler abstraction Sharing data easier portability Easy design with easy programming Low performance(for high communication) Distributed shared memory model

26 Degree of Coupling SIMDMIMD Shared Memory Distributed Memory Supported Grain Sizes Communication Speed slowfast fine coarse loose tight SIMDSMPNUMACluster Parallel and Distributed Architecture (Leopold, 2001)

27 RAM PRAM BSP LOGP MPI Computational Model

28 RAM Model

29 Synchronized Read Compute Write Cycle EREW ERCW CREW CRCW Control Private Memory P1P1 Private Memory P2P2 Private Memory PpPp Global Memory Parallel Random Access Machine PRAM Model

30 Generalization of PRAM Model Processor- Memory Pairs Communication Network Barrier Synchronization Super-step Processes Execute Communications Barrier Synchronization Bulk Synchronous Parallel (BSP) Model

31 Cost of superstep = w+max(hs,hr).g+l –w (maximum number of local operation) –hs (maximum # of packets sent) –hr (maximum # of packets received) g (communication throughput) p (number of Processors) l (synchronization latency) BSP Space Complexity

32 Closely related to BSP It models asynchronous execution News Parameters L (message latency) o The overhead, defined as the length of time that a processor is engaged in the transmission or reception of each message. During this time the processor cannot perform other operations. g : The gap, defined as the minimum time interval between consecutive message transmissions or receptions. The reciprocal of g corresponds to the available per-processor bandwidth P: The number of processor/memory modules. LogP Model

33 Logp (cont.)

34 What Is MPI? A message-passing library specification message-passing model not a compiler specification not a specific product For parallel computers, clusters, and heterogeneous networks Full-featured Designed to permit (unleash?) the development of parallel software libraries Designed to provide access to advanced parallel hardware for end users library writers tool developers MPI(Message Passing Interface)

35 Application MPI Comm. Application MPI Comm. Node 1 Node 2 Task 1Task 2 Virtual communication Real communication MPI Layer

36 Matrix Multiplication Example

37 PRAM Matrix Multiplication Cost Of PRAM Algorithm

38 BSP Matrix Multiplication Cost of algorithm

39 Concurrency Approach Control Parallel Data Parallel

40 Control Parallel

41 Data Parallel

42 The Best granularity for programming

43 Explicit Parallel Programming Occam, MPI, PVM Implicit Parallel Programming Parallel functional programming ML,… Concurrent object-oriented programming COOL,… Data parallel programming Fortran 90, HPF,… Parallel Programming

44 A Cluster system is –Parallel multicomputer built from high-end PCs and conventional high-speed network. –Support parallel programming Cluster Computing

45 Scientific Computing –Simulation, CFD, CAD/CAM, Weather prediction, process large volume of data Super server system –Scalable internet/ web server –Database server –Multimedia, video, audio server Applications Cluster Computing(cont.)

46 Cluster System Building Block High Speed Network HW OS Single System Image Layer System Tool Layer Application Layer Cluster Computing(cont.)

47 Why cluster computing? Scalability –Build small system first, grow it later. Low-cost –Hardware based on COTS model (Component off-the-shelf) –S/w(SoftWare) based on freeware from research community Easier to maintain Vendor independent Cluster Computing(cont.)

48 The End


Download ppt "Parallel Computing Department Of Computer Engineering Ferdowsi University Hossain Deldari."

Similar presentations


Ads by Google