Presentation is loading. Please wait.

Presentation is loading. Please wait.

Grid Computing, B. Wilkinson, 20047a.1 Computational Grids.

Similar presentations


Presentation on theme: "Grid Computing, B. Wilkinson, 20047a.1 Computational Grids."— Presentation transcript:

1 Grid Computing, B. Wilkinson, 20047a.1 Computational Grids

2 Grid Computing, B. Wilkinson, 20047a.2 Computational Problems Problems that have lots of computations and usually lots of data.

3 Grid Computing, B. Wilkinson, 20047a.3 Demand for Computational Speed Continual demand for greater computational speed from a computer system than is currently possible Areas requiring great computational speed include numerical modeling and simulation of scientific and engineering problems. Computations must be completed within a “reasonable” time period.

4 Grid Computing, B. Wilkinson, 20047a.4 Grand Challenge Problems One that cannot be solved in a reasonable amount of time with today’s computers. Obviously, an execution time of 10 years is always unreasonable. Examples Modeling large DNA structures Global weather forecasting Modeling motion of astronomical bodies.

5 Grid Computing, B. Wilkinson, 20047a.5 Weather Forecasting Atmosphere modeled by dividing it into 3-dimensional cells. Calculations of each cell repeated many times to model passage of time.

6 Grid Computing, B. Wilkinson, 20047a.6 Global Weather Forecasting Example Suppose global atmosphere divided into cells of size 1 mile  1 mile  1 mile to a height of 10 miles - about 5  10 8 cells. Suppose each calculation requires 200 floating point operations. In one time step, 10 11 floating point operations necessary. To forecast weather over 7 day period using 1- minute intervals, a computer operating at 1Gflops (10 9 floating point operations/s) takes 10 6 seconds or over 10 days.

7 Grid Computing, B. Wilkinson, 20047a.7 Modeling Motion of Astronomical Bodies Each body attracted to each other body by gravitational forces. Movement of each body predicted by calculating total force on each body. With N bodies, N - 1 forces to calculate for each body, or ≈ N 2 calculations. (N log 2 N for an efficient approximate algorithm.) After determining new positions of bodies, calculations repeated.

8 Grid Computing, B. Wilkinson, 20047a.8 A galaxy might have, say, 10 11 stars. Even if each calculation done in 1 ms (extremely optimistic figure), it takes almost a year for one iteration using the N log 2 N algorithm. 100 years for 100 iterations. Typically require millions of iterations.

9 Grid Computing, B. Wilkinson, 20047a.9 Astrophysical N-body simulation by Scott Linssen (undergraduate UNC-Charlotte student).

10 Grid Computing, B. Wilkinson, 20047a.10 High Performance Computing (HPC) Traditionally, achieved by using the multiple computers together - parallel computing. Simple idea! -- Using multiple computers (or processors) simultaneously should be able can solve the problem faster than a single computer.

11 Grid Computing, B. Wilkinson, 20047a.11 Using multiple computers or processors Key concept - dividing problem into parts that can be computed simultaneously. Parallel programming - programming a computing platform consisting of more than one processor or computer. Concept very old (50 years).

12 Grid Computing, B. Wilkinson, 20047a.12 High Performance Computing Long History: –Multiprocessor system of various types (1950’s onwards) –Supercomputers (1960s-80’s) –Cluster computing (1990’s) –Grid computing (2000’s) ?? Maybe, but let’s first look at how to achieve HPC.

13 Grid Computing, B. Wilkinson, 20047a.13 Speedup Factor t s is execution time on a single processor t p is execution time on a multiprocessor. S(p) gives increase in speed by using multiprocessor. Best sequential algorithm for single processor. Parallel algorithm usually different.

14 Grid Computing, B. Wilkinson, 20047a.14 Maximum Speedup Maximum speedup is usually p with p processors (linear speedup). Possible to get superlinear speedup (greater than p) but usually a specific reason such as: – Extra memory in multiprocessor system – Non-deterministic algorithm

15 Grid Computing, B. Wilkinson, 20047a.15 Maximum Speedup Amdahl’s law

16 Grid Computing, B. Wilkinson, 20047a.16 Speedup factor is given by: This equation is known as Amdahl’s law

17 Grid Computing, B. Wilkinson, 20047a.17 Speedup against number of processors Even with infinite number of processors, max. speedup limited to 1/f. Example: With only 5% of computation being serial, max. speedup 20, irrespective of number of processors.

18 Grid Computing, B. Wilkinson, 20047a.18 Superlinear Speedup Example Searching (a) Searching each sub-space sequentially

19 Grid Computing, B. Wilkinson, 20047a.19 (b) Searching each sub-space in parallel

20 Grid Computing, B. Wilkinson, 20047a.20 Question What is the speed-up now?

21 Grid Computing, B. Wilkinson, 20047a.21 Worst case for sequential search when solution found in last sub-space search. Then parallel version offers greatest benefit, i.e.

22 Grid Computing, B. Wilkinson, 20047a.22 Least advantage for parallel version when solution found in first sub-space search of the sequential search, i.e. Actual speed-up depends upon which subspace holds solution but could be extremely large.

23 Grid Computing, B. Wilkinson, 20047a.23 Types of Parallel Computers Two principal types: 1.Single computer containing multiple processors - main memory is shared, hence called “Shared memory multiprocessor” 2.Multiple computer system

24 Grid Computing, B. Wilkinson, 20047a.24 Conventional Computer Consists of a processor executing a program stored in a (main) memory: Each main memory location located by its address within a single memory space.

25 Grid Computing, B. Wilkinson, 20047a.25 Shared Memory Multiprocessor Extend single processor model - multiple processors connected to multiple memory modules Each processor can access any memory module:

26 Grid Computing, B. Wilkinson, 20047a.26 Examples Dual Pentiums Quad Pentiums

27 Grid Computing, B. Wilkinson, 20047a.27 Programming Shared Memory Multiprocessors Threads - programmer decomposes program into parallel sequences (threads), each being able to access variables declared outside threads. Example: Pthreads Use sequential programming language with preprocessor compiler directives, constructs, or syntax to declare shared variables and specify parallelism. Examples: OpenMP (an industry standard), UPC (Unified Parallel C) -- needs compilers.

28 Grid Computing, B. Wilkinson, 20047a.28 Parallel programming language with syntax to express parallelism. Compiler creates executable code -- not now common. Use parallelizing compiler to convert regular sequential language programs into parallel executable code - also not now common.

29 Grid Computing, B. Wilkinson, 20047a.29 Multiple Computers Message-passing multicomputer Complete computers connected through and interconnection network:

30 Grid Computing, B. Wilkinson, 20047a.30 Networked Computers as a Computing Platform Became a very attractive alternative to expensive supercomputers and parallel computer systems for high-performance computing in 1990’s. Several early projects. Notable: – Berkeley NOW (network of workstations) project. –NASA Beowulf project.

31 Grid Computing, B. Wilkinson, 20047a.31 Key Hardware Advantages: Very high performance workstations and PCs readily available at low cost. Latest processors can easily be incorporated into the system as they become available.

32 Grid Computing, B. Wilkinson, 20047a.32 Programming Clusters Usually based upon explicit message- passing. Common approach -- a set of user-level libraries for message passing. Example: –Parallel Virtual Machine (PVM) - late 1980’s. Became very popular in mid 1990’s. –Message-Passing Interface (MPI) - standard defined in 1990’s and now dominant.

33 Grid Computing, B. Wilkinson, 20047a.33 Beowulf Clusters Name given to a group of interconnected “commodity” computers designed to achieve high performance with low cost. Typically using commodity interconnects (high- speed Ethernet). Typically Linux OS. Beowulf comes from name given by NASA Goddard Space Flight Center cluster project.

34 Grid Computing, B. Wilkinson, 20047a.34 Cluster Interconnects Originally fast Ethernet on low cost clusters Gigabit Ethernet - easy upgrade path More Specialized/Higher Performance Myrinet - 2.4 Gbits/sec - disadvantage: single vendor Infiniband - may be important as Infiniband interfaces may be integrated on next generation PCs

35 Grid Computing, B. Wilkinson, 20047a.35 Dedicated cluster with a master node

36 Grid Computing, B. Wilkinson, 20047a.36 WCU Department of Mathematics and CS leo I cluster (now dismandled) Being replaced with Pentium IVs and Gigabit Ethernet.

37 Grid Computing, B. Wilkinson, 20047a.37 Message-Passing Programming using User-level Message Passing Libraries Two primary mechanisms needed: 1.A method of creating separate processes for execution on different computers 2.A method of sending and receiving messages

38 Grid Computing, B. Wilkinson, 20047a.38 Multiple program, multiple data model (MPMD)

39 Grid Computing, B. Wilkinson, 20047a.39 Single Program Multiple Data Model (SPMD) Different processes merged into one program. Control statements select different parts for each processor to execute. All executables started together - static process creation

40 Grid Computing, B. Wilkinson, 20047a.40 Single Program Multiple Data Model (SPMD)

41 Grid Computing, B. Wilkinson, 20047a.41 Multiple Program Multiple Data Model (MPMD) Separate programs for each processor. One processor executes master process. Other processes started from within master process - dynamic process creation.

42 Grid Computing, B. Wilkinson, 20047a.42 Multiple Program Multiple Data Model (MPMD)

43 Grid Computing, B. Wilkinson, 20047a.43 “Point-to-point” send and receive routines Passing a message between processes using send() and recv() library calls:

44 Grid Computing, B. Wilkinson, 20047a.44 Synchronous Message Passing Routines that return when message transfer completed. Synchronous send routine Waits until complete message can be accepted by the receiving process before sending the message. Synchronous receive routine Waits until the message it is expecting arrives.

45 Grid Computing, B. Wilkinson, 20047a.45 Synchronous send() and recv() using 3-way protocol

46 Grid Computing, B. Wilkinson, 20047a.46 Synchronous routines intrinsically perform two actions: –They transfer data and –They synchronize processes.

47 Grid Computing, B. Wilkinson, 20047a.47 Asynchronous Message Passing Do not wait for actions to complete before returning. More than one version depending upon semantics for returning. Usually require local storage for messages. They do not synchronize processes and allow processes to move forward sooner. Must be used with care.

48 Grid Computing, B. Wilkinson, 20047a.48 MPI Definitions of Blocking and Non-Blocking Blocking - return after their local actions complete, though message transfer may not have been completed. Non-blocking - return immediately. Assumes data storage not modified by subsequent statements prior to being used for transfer, and it is left to the programmer to ensure this.

49 Grid Computing, B. Wilkinson, 20047a.49 How message-passing routines return before transfer completed Message buffer needed between source and destination to hold message:

50 Grid Computing, B. Wilkinson, 20047a.50 Asynchronous (blocking) routines changing to synchronous routines Buffers only of finite length and a point could be reached when send routine held up because all available buffer space exhausted. Then, send routine will wait until storage becomes re-available - i.e then routine behaves as a synchronous routine.

51 Grid Computing, B. Wilkinson, 20047a.51 Message Tag Used to differentiate between different types of messages being sent. Message tag is carried within message.

52 Grid Computing, B. Wilkinson, 20047a.52 Message Tag Example To send a data, x, with message tag 5 from process, 1, to destination process, 2, and assign to y:

53 Grid Computing, B. Wilkinson, 20047a.53 Wild Card If message tag matching not required, wild card message tag used. Then, recv() will match with any send().

54 Grid Computing, B. Wilkinson, 20047a.54 Collective Message Passing Routines Have routines that send message(s) to a group of processes or receive message(s) from a group of processes Higher efficiency than separate point-to- point routines although not absolutely necessary.

55 Grid Computing, B. Wilkinson, 20047a.55 Broadcast Sending same message to all processes concerned with problem.

56 Grid Computing, B. Wilkinson, 20047a.56 Scatter Sending each element of an array in root process to a separate process. Contents of ith location of array sent to ith process.

57 Grid Computing, B. Wilkinson, 20047a.57 Gather Having one process collect individual values from set of processes.

58 Grid Computing, B. Wilkinson, 20047a.58 Reduce Gather operation combined with arithmetic/logical operation. Example: Values gathered and added together:

59 Grid Computing, B. Wilkinson, 20047a.59 Grid Computing A grid is a form of multiple computer system. For solving computational problems, it could be viewed as the next step after cluster computing, and the same programming techniques used. Why is this not necessarily true?

60 Grid Computing, B. Wilkinson, 20047a.60 VERY expensive, sending data across network costs millions of cycles Bandwidth shared with other users Links unreliable

61 Grid Computing, B. Wilkinson, 20047a.61 Computational Strategies As a computing platform, a grid favors situations with absolute minimum communication between computers. Next class will look at these strategies and details of MPI programming.


Download ppt "Grid Computing, B. Wilkinson, 20047a.1 Computational Grids."

Similar presentations


Ads by Google