Presentation is loading. Please wait.

Presentation is loading. Please wait.

PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker.

Similar presentations


Presentation on theme: "PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker."— Presentation transcript:

1 PARALLEL PROCESSING COMPARATIVE STUDY 1

2 CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker has a limit Inadequate for long works 2

3 CONTEXT How to finish a calculation in short time???? Solution To use quicker calculator (processor).[1960-2000] Inconvenient: The speed of processor has reach a limit Inadequate for long calculations 3

4 CONTEXT How to finish a work in short time???? Solution 1. To use quicker worker. (Inadequate for long works) 4

5 CONTEXT How to finish a work in short time???? Solution 1. To use quicker worker. (Inadequate for long works) 5

6 CONTEXT How to finish a work in short time???? Solution 1. To use quicker worker. (Inadequate for long works) 2. To use more than one worker concurrently 6

7 CONTEXT How to finish a Calculation in short time???? Solution 1. To use quicker processor (Inadequate for long calculations ) 7

8 CONTEXT How to finish a Calculation in short time???? Solution 1. To use quicker processor (Inadequate for long calculations ) 8

9 CONTEXT How to finish a Calculation in short time???? Solution 1. To use quicker processor (Inadequate for long calculations) 2. To use more than one processor concurrently 9

10 CONTEXT How to finish a Calculation in short time???? Solution 1. To use quicker processor (Inadequate for long calculations) 2. To use more than one processor concurrently Parallelism 10

11 CONTEXT Definition The parallelism is the concurrent use of more than one processing unit (CPUs, Cores of processor, GPUs, or combinations of them) in order to carry out calculations more quickly 11

12 PROJECT GOAL Parallelism needs 1. Parallel Computer (more than one processors) 2. Accommodate Calculation to Parallel Computer 12

13 THE GOAL Parallelism needs 1. Parallel Computer (more than one processors) 2. Accommodate Calculation to Parallel Computer 13

14 THE GOAL Parallel Computer  Several parallel computers in the hardware market  Differ in their architecture  Several Classifications  Based on the Instruction and Data Streams (Flynn classification)  Based on the Memory Charring Degree  …. 14

15 THE GOAL Flynn Classification A. Single Instruction and Single Data stream 15

16 THE GOAL Flynn Classification B. Single Instruction and Multiple Data 16

17 THE GOAL Flynn Classification C. Multiple Instruction and Single Data stream 17

18 THE GOAL Flynn Classification D. Multiple Instruction and Multiple Data stream 18

19 THE GOAL Memory Sharing Degree Classification A. Shared Memory B. Distributed memory 19

20 THE GOAL Memory Sharing Degree Classification C. Hybrid Distributed-Shared Memory 20

21 THE GOAL Parallelism needs 1. Parallel Computer (more than one processors) 2. Accommodate Calculation to Parallel Computer Dividing the calculation and data between the processors Defining the execution scenario (how the processor cooperates) 21

22 THE GOAL Parallelism needs 1. Parallel Computer (more than one processors) 2. Accommodate Calculation to Parallel Computer Dividing the calculation and data between the processors Defining the execution scenario (how the processor cooperates) 22

23 THE GOAL Parallelism needs 1. Parallel Computer (more than one processors) 2. Accommodate Calculation to Parallel Computer Dividing the calculation and data between the processors Defining the execution scenario (how the processors cooperate) 23

24 THE GOAL The accommodation of calculation to parallel computer Is called parallel processing Depend closely on the architecture 24

25 THE GOAL Goal : A comparative study between 1. Shared Memory Parallel Processing approach 2. Distributed Memory Parallel Processing approach 25

26 PLAN 1. Distributed Memory Parallel Processing approach 2. Shared Memory Parallel Processing approach 3. Case study problems 4. Comparison results and discussion 5. Conclusion 26

27 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH 27

28 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Distributed-Memory Computers (DMC) = Distributed Memory System (DMS) = Massively Parallel Processor (MPP) 28

29 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Distributed-memory computers architecture 29

30 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Architecture of nodes Nodes can be : identical processors  Pure DMC different types of processor  Hybrid DMC different type of nodes with different Architectures  Heterogeneous DMC 30

31 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Architecture of Interconnection Network  No shared memory space between nodes  Network is the only way of node-communications  Network performance influence directly the performance of parallel program on DMC  Network performance depends on : 1. Topology 2. Physical connectors (as wires…) 3. Routing Technique  The DMC evolutions closely depends on the Networking evolutions 31

32 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH The Used DMC in our Comparative Study Heterogeneous DMC Modest cluster of workstations Three nodes: Sony Laptop: i3 processor HP Laptop: i3 processor HP Laptop core 2 due processor Communication Network: 100 MByte-Ethernet 32

33 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Parallel Software Development for DMC Designer main tasks: 1. Global Calculation decomposition and tasks assignment 2. Data decomposition 3. Communications scheme Definition 4. Synchronization Study 33

34 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Parallel Software Development for DMC Important considerations for efficiency: 1. Minimize Communication 2. Avoid barrier synchronization 34

35 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH Implementation environments Several implementation environments PVM (Parallel Virtual Machine) MPI (Message Passing Interface) 35

36 DISTRIBUTED MEMORY PARALLEL PROCESSING APPROACH MPI Application Anatomy All the node execute the same code All the nodes does not do the same work It’s possible using SPMD application form SPMD :.... The processes are organized in one controller and workers Contradiction

37 SHARED MEMORY PARALLEL PROCESSING APPROACH Several SMPC in the Markets Multi-core PC: Intel i3 i5 i7,AMD Which SMPC we use ? - GPU originally for image processing - GPU NOW : Domestic Super-Computer Characteristics: Chipset and fastest Shared Memory Parallel computer Hard Parallel Design 37

38 SHARED MEMORY PARALLEL PROCESSING APPROACH  The GPU Architecture  The implementation environment 38

39 SHARED MEMORY PARALLEL PROCESSING APPROACH GPU Architecture As the classical processing unit, the Graphics Processing Unit is composed from two main components: A- Calculation Units B- Storage Unit 39

40 SHARED MEMORY PARALLEL PROCESSING APPROACH 40

41 SHARED MEMORY PARALLEL PROCESSING APPROACH 41 SHARED MEMORY PARALLEL PROCESSING APPROACH

42 SHARED MEMORY PARALLEL PROCESSING  The GPU Architecture  The implementation environment 1. CUDA : for GPU S manufactured by NVIDIA 2. OpenCL: independent of the GPU architecture 42

43 SHARED MEMORY PARALLEL PROCESSING CUDA Program Anatomy 43

44 SHARED MEMORY PARALLEL PROCESSING Q: How to execute code fragments to be parallelized in the GPU? R: By Calling a kernel Q: What’s Kernel ? R: A kernel is a function callable from the host and executed on the device simultaneously by many threads in parallel 44

45 KERNEL LAUNCH 45 SHARED MEMORY PARALLEL PROCESSING

46 KERNEL LAUNCH 46 SHARED MEMORY PARALLEL PROCESSING

47 KERNEL LAUNCH 47 SHARED MEMORY PARALLEL PROCESSING

48 Design recommendations  utilize the shared memory to reduce the amount of time to access the global memory.  reduce the amount of idle threads ( control divergence) to fully utilize the GPU resource. 48

49 CASE STUDY PROBLEM 49

50 CASE STUDY PROBLEM 50

51 COMPARSION Comparisons Creteria Analysis and conclusion 51

52 COMPARISON Criteria 1 : Time-Cost factor = ∗ : Parallel Execution Time (in Milliseconds) : The Hardware Cost (in Saudi Arabia Riyals) The Hardware costs() GPU : 5000 SAR Cluster of workstation : 9630 SAR. 52

53 COMPARISON 53

54 COMPARISON Conclusion: GPU is better if we need to perform a lot of number of small amount of iterations calculation. However if our need is to perform a calculation with big amount of iterations, the cluster of workstations is the best choice. 54

55 COMPARISON Criteria 2 : required Memory Matrix multiplication problem Graphics Processing Unit  The Global-Memory-based-method requirement: ℎ =6 ∗∗∗  The Shared-Memory-based-method requirement: ℎ =8 ∗∗∗ Cluster of workstations  The used cluster contains three nodes ℎ =19/3 ∗∗∗ 55

56 COMPARISON Criteria 2 : required Memory Pi approximation problem Graphics Processing Unit The size of these arrays depends on the number of used thread The required memory = ∗ ∗ Cluster of workstations Small amount of memory used on each node almost 15 ∗ 56

57 COMPARISON Criteria 2 : required Memory Conclusion: We cannot judge which parallel approach is the better for the required memory criteria. This criteria depends on the intrinsic characteristics of the on-hand problem. 57

58 COMPARISON Criteria 3 : The Gap between the Theoretical Complexity and E ff ective Complexity The Gap between the Theoretical Complexity and E ff ective Complexity- calculated by: =((/) − 1)×100 : Experimental Parallel Time : Theoretical Parallel Time = / : Sequential Time. : Number of processing unit. 58

59 CLUSTER OF WORKSTATIONS 59 COMPARISON Criteria 3 : The Gap between the Theoretical Complexity and E ff ective Complexity

60 GRAPHICS PROCESSING UNIT 60 COMPARISON Criteria 3 : The Gap between the Theoretical Complexity and E ff ective Complexity

61 COMPARISON Conclusion In the GPU, the resulting execution time of parallel program can give less time than the theoretical expected time. That is impossible to achieve when using a Cluster of workstation because of the communication overhead. To minimize the Gap, or take it constant, in the cluster of workstations, the designer has to maintain constant, as possible, number and sizes of communicated messages when increasing the problem size. 61 Criteria 3 : The Gap between the Theoretical Complexity and E ff ective Complexity

62 COMPARISON 62

63 CRITERIA 4 : EFFICIENCY 63 COMPARISON

64 Criteria 4 : Efficiency Conclusion: The efficiency (speedup) is much better in the GPU than in the cluster of workstations. 64

65 IMPORTANT NOTIFICATION 65 COMPARISON

66 IMPORTANT NOTIFICATION

67 COMPARISON Criteria 5 : Hardness of development Cuda MPI 67

68 COMPARISON Criteria 6 : necessary hardware and software materials GPU (Nvidia gt 525m ) Cluster of workstation( 3 pc, switch, internet modem and wires) 68

69 69

70 CONCLUSION 70

71 Parallel Processing Comparative Study Shared Memory Parallel Processing ApproachDistributed Memory Parallel Processing Approach Graphics Processing Unit (GPU)Cluster Of work-station GPU and Cluster are the main two components of the Fastest Word Computers (As Shahin) To compare we use : Two different problems (Matrix-Multiplication and Pi Approximation) Six Measure’s Criteria More Adequate for Data-Level Parallelism FormMore Adequate for Task –Level Parallelism Form Big number of small calculationA Big calculation Memory requirement ̴ Problem Characteristics Better than the expected Run TimeImpossible Null or Negative GAP Complicate Design and programmingLess complicated Implementation environment very practical Complicated

72 72


Download ppt "PARALLEL PROCESSING COMPARATIVE STUDY 1. CONTEXT How to finish a work in short time???? Solution To use quicker worker. Inconvenient: The speed of worker."

Similar presentations


Ads by Google