Presentation is loading. Please wait.

Presentation is loading. Please wait.

Example: Sorting on Distributed Computing Environment Apr 20, 2009 1.

Similar presentations


Presentation on theme: "Example: Sorting on Distributed Computing Environment Apr 20, 2009 1."— Presentation transcript:

1 Example: Sorting on Distributed Computing Environment Apr 20, 2009 1

2 About this presentation Example for starting preparation of the "Enshu" part of this class Give a talk about the problem attached to you according to the theme of the day. Does not show complete presentation. Just shows the points to be studied, explained and solved. 2

3 Sorting Large Number of Data Data size > Memory size of single computer Ex. 1~100trillion integer numbers Distributed Parallel Sort: Distribute the data into multiple computers on a network and sort. ⇒ Use multiple computational power ⇒ Requires communication among computers 3

4 Computational Infrastructures Case 1: PCs in a computer room Use all of the PCs on holidays or in midnights ~100 PCs (200~400GB of memory in total) Case 2: Supercomputers in Japan Enable "Ultra Large Scale Computation" by using supercomputers all over Japan 10~20 supercomputers Speed: 10TFLOPS ~ 100TFLOPS / each Memory: 10TB ~ 100TB / each 4

5 Network Infrastructures Case 1: Ethernet Switch Bandwidth: 100Mbps ~ 1Gbps Latency: 0.05~0.1msec Case 2: SINET3 (Academic Network in Japan) Backbone Bandwidth: 10~40Gbps Bandwidth per computer: ~10Gbps Latency: 10~100msec depends on the length of physical networks 5

6 Bandwidth? Latency? Bandwidth: Available speed of data transfer (bit/sec) on the network Latency: Minimum time required for each data transfer Estimation of the cost for a data transfer: T = L + S / B L: Latency, B: Bandwidth, S: Data size (bit) 6

7 System Infrastructure Case 1: Network environment can be "Reliable", since no other user is using the system. Implementation of the program will be easier by installing MPI(Message Passing Interface). Case 2: Network environment may be "Unreliable" since many users share the network routes. Usage of MPI is difficult, since the environment is "Heterogeneous" Different architectures and OSs 7

8 Implementation on Internet Everything can be built on "Application Layer" Choose a protocol for internet: TCP or UDP Case 1: UDP (or MPI over UDP) Case 2: TCP Choose a parallel algorithm of sorting Parallel Algorithm: algorithm for solving a problem by dividing it into multiple tasks and running them concurrently 8

9 "Layers" of networks OSI model: divides facilities of network devices into 7 layers Application Layer Presentation Layer Session Layer Transport Layer Network Layer Data Link Layer Physical Layer 9

10 TCP or UDP TCP (Transmission Control Protocol) Guarantees the completion of data transfer. Slow but reliable. UDP(User Datagram Protocol) No guarantee about data transfer. Fast but unreliable. Sorting requires every data to be correctly transferred. ⇒ TCP is preferred. On reliable networks such as Case 1, UDP can be used. MPI is an interface over UDP (or TCP). Guarantees data transfer even over UDP. 10

11 Detailed Implementation of Softing Program Implementation of parallel algorithm: Cost of computation? Cost of communication? Requirements of Memory? Policies for distributing computation and data affects the performance. 11

12 Characteristics of each case Case 1: Low latency and narrow bandwidth Total amount of computational power and memory is small No need for load-balancing Case 2: High latency and wide bandwidth (Possibly) Large amount of computational power and memory Requires load-balancing according to the computational power of each machine. 12

13 Implementation for Case 1. Distribute same amount of computation and data on each computer Consider the number of PCs to be used: Communication cost increases according to the number of PCs If the target data is large enough, it will achieve sufficient speedup by parallelization even with 100PCs. 13

14 Implementation for Case 2 The amount of computation and data depends on the relative performance of each computer. Accurate analysis of the performance of each machine and network is important. It must be difficult to obtain sufficient effect of parallelization with large number of nodes. Performance degradation by load unbalance and communication cost. 14

15 To complete the presentation of your solution Detailed information about the infrastructure. Detailed information of implementation: Parallel Algorithm? Policy for distributing data and computation? Estimate computation and communication time, and find the optimal distribution. How to distribute the target data? and how to gather the results? Management of multiple jobs. Standardization of the solution Relationship with the future networks 15

16 Exercise Find existing parallel algorithms For example: Sorting Note: Algorithms for "distributed memory parallel" computing environment = Each computer has its own memory => Requires explicit communication Consider how to implement them on computers connected with Internet. 16


Download ppt "Example: Sorting on Distributed Computing Environment Apr 20, 2009 1."

Similar presentations


Ads by Google