Presentation is loading. Please wait.

Presentation is loading. Please wait.

LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung 05700512 Wong Chung Hoi05596742 Supervised by Prof. Michael R. Lyu Department of Computer.

Similar presentations


Presentation on theme: "LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung 05700512 Wong Chung Hoi05596742 Supervised by Prof. Michael R. Lyu Department of Computer."— Presentation transcript:

1 LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung 05700512 Wong Chung Hoi05596742 Supervised by Prof. Michael R. Lyu Department of Computer Science and Engineering, CUHK 2007-2008 Final Year Project Presentation (1st term)

2 LYU0703 Parallel Distributed Programming on PS3 2 Agenda Background Information Architecture of PlayStation ® 3 Principals of Parallel Programming Optimization of the ADVISER program: 1. Sequential Approach 2. Parallel Approach Conclusion Future Works Q&A

3 LYU0703 Parallel Distributed Programming on PS3 3 Background Information Limitation of single-core processor: 1.Memory Access Latency 2.Wire Delays 3.Power Consumption

4 LYU0703 Parallel Distributed Programming on PS3 4 Power Consumption P = power C = capacitance V = voltage F = processor frequency (cycles per second)

5 LYU0703 Parallel Distributed Programming on PS3 5 Development of Multi-Core Processor Fig. 1.4 Growth of No. of Cores in Processors

6 LYU0703 Parallel Distributed Programming on PS3 6 Development of Multi-Core Processor Reduce power consumption - use multiple cores with low frequency instead of one with high frequency Efficient processing of multiple tasks - divide the computation work - execute among the cores concurrently

7 LYU0703 Parallel Distributed Programming on PS3 7 Project Objectives Need of parallel programming to optimize intensive-computation applications Study features of parallel programming, compare sequential and parallel approach Optimize an application, showing great improvement by parallel programming

8 LYU0703 Parallel Distributed Programming on PS3 8 Architecture of PlayStation ® 3 (PS3) A multi-core machine produced by Sony, with the Cell Broadband Engine Strong Computation Power Opened platform for other applications and development

9 LYU0703 Parallel Distributed Programming on PS3 9 Cell Broadband Engine (Cell BE) PPE – Power Processor Element SPE – Synergistic Processor Element EIB – Element Interconnect Bus

10 LYU0703 Parallel Distributed Programming on PS3 10 Power Processor Element (PPE) 64-bit PowerPC architecture based General purpose operation Designed as control- intensive Control I/O of main memeory and other devices by the OS Control over all 8 SPEs Fig. 2.5 Design of PPE

11 LYU0703 Parallel Distributed Programming on PS3 11 Synergistic Processor Element (SPE) Designed to provide computation performance SPU – perform allocated task LS – the only memory MFC – control data transfer Totally 8 SPEs in Cell Only 6 acessisble 1 reserved for system software 1 disabled Fig. 2.6 Design of a SPE

12 LYU0703 Parallel Distributed Programming on PS3 12 Element Interconnect Bus (EIB) Internal communication bus inside Cell Connect different elements: PPE, SPEs. Memory controller Fig. 2.7 Data Flow and Program Control

13 LYU0703 Parallel Distributed Programming on PS3 13 Principal of Parallel Programming Parallel algorithmSerial algorithm multiple processing unitssingle processing unit communication overheadno communication overhead higher complexity in codestraight forward code ensure load balance between PUeverything is done by CPU

14 LYU0703 Parallel Distributed Programming on PS3 14 Concept of Load Balance Distribute data evenly Total runtime depends on the busiest processing element Wasting computation time on idling processing element

15 LYU0703 Parallel Distributed Programming on PS3 15 Method of parallelism Data parallelism Task parallelism

16 Parallel Architecture Flynn's taxonomy Single Instruction Multiple Instruction Single Data SISDMISD Multiple Data SIMDMIMD LYU0703 Parallel Distributed Programming on PS3 16

17 SISD Traditional Computer von Neumann model LYU0703 Parallel Distributed Programming on PS3 17

18 SIMD Same instruction on all data Data parallelism SIMD intrinsic function LYU0703 Parallel Distributed Programming on PS3 18

19 MISD No well known system Mention for completeness LYU0703 Parallel Distributed Programming on PS3 19

20 MIMD Different instruction on different data Task parallelism Further break down to –Shared Memory System –Distributed Memory System LYU0703 Parallel Distributed Programming on PS3 20

21 Shared Memory System Access to central memory for data PS3 :Achieve by MFC issuing DMA command LYU0703 Parallel Distributed Programming on PS3 21

22 Distributed Memory System Each PE has its own memory PS3: Each SPE has 256KB Local Store PS3 is hybrid shared-distributed memory system LYU0703 Parallel Distributed Programming on PS3 22

23 ADVISER Comparing 2 video clips 1.Generating meaningful data (in form of numbers) of frames from the video 2.Comparing and looking for the most similar frames 3.Locating the similar segment which consist of a series of very similar frames LYU0703 Parallel Distributed Programming on PS3 23

24 Input 2 Folder, “Repository” & “Target” hl3 file = vector of 1024 double precision values LYU0703 Parallel Distributed Programming on PS3 24 InputNo. of hl3 files Target directory5473 Repository directory7547

25 Processing hl3 file = vector of 1024 double precision values File P File Q Similarity = Smaller the better LYU0703 Parallel Distributed Programming on PS3 25

26 Output M “Target”, N “Repository” O ( M * N ) Computation time = 633 sec Flash demo LYU0703 Parallel Distributed Programming on PS3 26 target hl3 1most match repository Adifference value = ?? target hl3 2most match repository Bdifference value = ?? target hl3 3most match repository Cdifference value = ??

27 Parallel Version Data parallelism Split data to 6 SPEs evenly Computation time for 6 SPEs = 330 sec Flash demo LYU0703 Parallel Distributed Programming on PS3 27

28 Parallel Version Expected speed up 6X Actual speed up 2X PC and PPU, SPE all run at different speed Computation time with CPU = 633 sec Computation time with 1 SPE = 1928 sec Computation time with PPU = 3119 sec CPU > SPE > PPU LYU0703 Parallel Distributed Programming on PS3 28

29 Time Attack 1.SIMD intrinsic function 2.Changing data type 3.Double Buffering 4.Parallel Read 5.Distributing Job to idling PPE 6.SIMD on loop counter 7.Loop unrolling LYU0703 Parallel Distributed Programming on PS3 29

30 SIMD intrinsic function Addition, subtraction, multiplication, etc. Operates on 128 bits registers Date type: double (64 bits) Speed up 2X LYU0703 Parallel Distributed Programming on PS3 30

31 Changing Data Type to int Precision not important Major speed up from SIMD intrinsic Data type: int (32 bits) Total Speed up 4X Computation time = 71 sec LYU0703 Parallel Distributed Programming on PS3 31

32 Changing Data Type to float SPE specified for high precision computation No intrinsic for int data type at all Data Type: float (32 bits) Save data conversion time Speed up by 30% Computation time = 49 sec LYU0703 Parallel Distributed Programming on PS3 32

33 Double buffering Save communication time MFC and SPU 2 buffers –Prefetching –Processing Not heavy in communication Minor speed up LYU0703 Parallel Distributed Programming on PS3 33

34 LYU0703 Parallel Distributed Programming on PS3 34 Parallel Reading for All Files Read “ Target ” and “ Repository ” concurrently Share file reading job among SPEs Not improve as predicted, even slower Reason: hard disk cannot cannot handle concurrent request Failed Attempt

35 LYU0703 Parallel Distributed Programming on PS3 35 Distributing Job to Idling PPE PPE current job: read files, distribute files, collect result Use stall time to do some computation Relatively low computation power of PPE No significant improvement Increase program complexity Abandon this approach

36 LYU0703 Parallel Distributed Programming on PS3 36 Applying SIMD for Loop Counter Major computation power consumed in: initialize i = 0, diff = (0, 0, 0, 0). for i < Number of float numbers in a file / Number of floats packed in a register A. temp = SIMD subtraction on vector i in “ Target ” and “ Repository ” file. B. diff = SIMD addition (SIMD multiplication (temp, temp), diff). i = i + 1. Loop back to 2.

37 LYU0703 Parallel Distributed Programming on PS3 37 Applying SIMD for Loop Counter Try to optimize step 3 Apply SIMD to the loop counter Addition and comparison operations are reduced by 8 times

38 LYU0703 Parallel Distributed Programming on PS3 38 Applying SIMD for Loop Counter initialize i = (0,1,2,3,4,5,6,7), diff = (0, 0, 0, 0). for i[0] < Number of float numbers in a file / Number of floats packed in a register temp = SIMD subtraction on vector i[0] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[1] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[2] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[3] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[4] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[5] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[6] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). temp = SIMD subtraction on vector i[7] in “ Target ” and “ Repository ” file. diff = SIMD addition (SIMD multiplication (temp, temp), diff). i = SIMD addition (i, (8, 8, 8, 8, 8, 8, 8, 8)). Loop back to 2.

39 LYU0703 Parallel Distributed Programming on PS3 39 Result of the parallel, with SIMD, float input, SIMD for loop counter PS3 version No. of SPU used 123456 Read input time (sec) 453444 Total Elapsed time (sec) 28614697756051 Net Elapsed time (sec) 28214194715647

40 LYU0703 Parallel Distributed Programming on PS3 40 Result of the parallel, with SIMD, float input, SIMD for loop counter PS3 version

41 LYU0703 Parallel Distributed Programming on PS3 41 Result of the parallel, with SIMD, float input, SIMD for loop counter PS3 version little improvement (about 4%). shows the possibility to have faster performance by further loop unrolling. The best performance becomes 47 sec

42 LYU0703 Parallel Distributed Programming on PS3 42 Loop Unrolling Proved that optimizing the loop can improve performance Completely loop unrolling More obvious speed up

43 LYU0703 Parallel Distributed Programming on PS3 43 Result of the parallel, with SIMD, float input, loop unrolling PS3 version No. of SPU used 123456 Read input time (sec) 343343 Total Elapsed time (sec) 1598255423530 Net Elapsed time (sec) 1567852393127

44 LYU0703 Parallel Distributed Programming on PS3 44 Result of the parallel, with SIMD, float input, loop unrolling PS3 version

45 LYU0703 Parallel Distributed Programming on PS3 45 Result of the parallel, with SIMD, float input, loop unrolling PS3 version 45% faster ultimate best performance becomes 27 sec

46 LYU0703 Parallel Distributed Programming on PS3 46 Conclusion of Optimization PC version: 663 sec PS3 with 1 SPU (i.e. sequential version on PS3): 1928 sec Final optimized version of PS3: 27 sec 23 times faster than PC version 71 times faster than sequential version on PS3

47 LYU0703 Parallel Distributed Programming on PS3 47 Conclusion of Optimization

48 LYU0703 Parallel Distributed Programming on PS3 48 Future Works Port the whole ADVISER application on PlayStation ® 3 Optimization throughout the whole application

49 LYU0703 Parallel Distributed Programming on PS3 49 Q&A

50 LYU0703 Parallel Distributed Programming on PS3 50 The End


Download ppt "LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung 05700512 Wong Chung Hoi05596742 Supervised by Prof. Michael R. Lyu Department of Computer."

Similar presentations


Ads by Google