Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Pertemuan 26 Parallel Processing 2 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1.

Similar presentations


Presentation on theme: "1 Pertemuan 26 Parallel Processing 2 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1."— Presentation transcript:

1 1 Pertemuan 26 Parallel Processing 2 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1

2 2 Learning Outcomes Pada akhir pertemuan ini, diharapkan mahasiswa akan mampu : Menjelaskan prinsip kerja parallel processing

3 3 Outline Materi Multiple Processor Organization Symmetric Multiprocessor Cache Coherence and The MESI Protocol Clusters Non-uniform Memory Access Vector Computation

4 4 Cache coherence and MESI Protocol The cache coherence: Multiple copies of the same data can exist in different caches simultaneously, and if processors are allowed to update their own copies freely, an inconsistent view of memory can result. Software solution Hardware solution Directory protocol Snoopy protocol

5 5 Cache coherence and MESI Protocol MESI cache line states M Modified E Exclusive S Shared I Invalid This cache line valid? Yes No The memory copy is … Out of dateValid - Copies exist in other caches? No Maybe A write to this line … Does not go to bus Goes to bus and updates cache Goes directly to bus

6 6 Cache coherence and MESI Protocol MESI state transition diagram

7 7 Clusters Four benefits that can be achieved with clustering: Absolute scalability Incremental scalability High availability Superior price/performance

8 8 Clusters Cluster configuration

9 9 Clustering methods: benefits and limitations Clustering methodDescriptionBenefitsLimitation Passive standbyA secondary server takes over in case of primary server failure. Easy to implement.High cost because the secondary server is unavailable for other processing tasks. Active secondaryThe secondary server is also used for processing tasks. Reduces cost because secondary servers can be used for processing. Increased complexity. Separate serversSeparate servers have their own disks. Data are continuously copied from primary to secondary server. High availability.High network and server overhead due to copying operations. Servers connected to disk Servers are cabled to the same disks, but each server owns its disk. If one server fails, its disks are taken over by the other server. Reduced network and server overhead due to elimination of copying operations. Usually requires disk mirroring or RAID technology to compensate for risk of disk failure. Servers share disks Multiple servers simultaneously share access to disks. Low network and server overhead. Reduced risk of downtime caused by disk failure. Requires look manager software. Usually used with disk mirroring or RAID technology.

10 10 Operating system design issue: Failure management Load balancing Parallel computation Parallelizing compiler Parallelized application Parametric computing Clusters

11 11 Uniform memory access (UMA) Non uniform memory access (NUMA) Cache coherent NUMA (CC-NUMA) Non uniform memory access

12 12 Non uniform memory access CC-NUMA Organization

13 13 DO 100 I = 1, N DO 100 J = 1, NC(I, J) = 0.0 (J = 1, N) C(I, J) = 0.0DO 100 K = 1, N C(1, J) = C(I, J) + A(I, K) * B(K, J) (J = 1, N) C(I, J) = C(I, J) + A(I, K) * B(K, J)100CONTINUE 100CONTINUE (a) Scalar processing(b) Vector processing DO 50 J = 1, N - 1 FORK 100 50CONTINUE J = N 100DO 200 I = 1, N C(I, J) = 0.0 DO 200 K = 1, N C(I, J) = C(I, J) + A(I, K) * B(K, J) 200CONTINUE (c) Parallel processing Vector computation

14 14 Vector computation

15 15 Vector computation

16 16 Vector computation DO 100 J = 1, 50 CR(J) = AR(J) * BR(J) – AI(J) * BI(J) 100CI(J) = AR(J) * BI(J) + AI(J) * BR(J) OperationCycleOperationCycle AR(J) * BR(J)  T1(J)3AR(J)  V1(J)1 AI(J) * BI(J)  T2(J)3BR(J)  V2(J)1 T1(J) – T2(J)  CR(J)3V1(J) * V2(J)  V3(J)1 AR(J) * BI(J)  T3(J)3AI(J)  V4(J)1 AI(J) * BR(J)  T4(J)3BI(J)  V5(J)1 T3(J) + T4(J)  CI(J)3V4(J) * V5(J)  V6(J)1 TOTAL12V3(J) – V6(J)  V7(J)1 (a) Storage to storageV7(J)  CR(J)1 V1(J) * V5(J)  V8(J)1 V4(J) * V2(J)  V9(J)1 V8(J) + V9(J)  V0(J)1 V0(J)  CI(J)1 TOTAL12 (b) Register to register

17 17 Vector computation DO 100 J = 1, 50 CR(J) = AR(J) * BR(J) – AI(J) * BI(J) 100CI(J) = AR(J) * BI(J) + AI(J) * BR(J) OperationCycleOperationCy cle AR(J)  V1(J)1 1 V1(J) * BR(J)  V2(J)1 1 AI(J)  V3(J)1 1 V3(J) * BI(J)  V4(J)1V2(J) – V(3) * BI(J)  V2(J)1 V2(J) – V4(J)  V5(J)1V2(J)  CR(J)1 V5(J)  CR(J)1V1(J) * BI(J)  V4(J)1 V1(J) * BI(J)  V6(J)1V4(J) + V3(J) * BR(J)  V5(J)1 V4(J) * BR(J)  V7(J)1V5(J)  CI(J)1 V6(J) + V7(J)  V8(J)1TOTAL8 V8(J)  CI(J)1(d) Compound instruction TOTAL10 (c) Storage to storage


Download ppt "1 Pertemuan 26 Parallel Processing 2 Matakuliah: H0344/Organisasi dan Arsitektur Komputer Tahun: 2005 Versi: 1/1."

Similar presentations


Ads by Google