Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers Andrzej M. Goscinski School of Computing and Mathematics Deakin.

Similar presentations


Presentation on theme: "1 Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers Andrzej M. Goscinski School of Computing and Mathematics Deakin."— Presentation transcript:

1 1 Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers Andrzej M. Goscinski School of Computing and Mathematics Deakin University Joint work with Michael Hobbs. Jackie Silcock and Justin Rough

2 2 Overview and Aims n Basic issues and solutions –Parallel processing: user expectations, clusters, phases –Parallelism management –Transparency –Communication paradigms –What to do? –Related systems n Cluster Execution environments –Middleware –Cluster operating systems n GENESIS –Architecture –Services for parallelism management and transparency n GENESIS programming interface –Message passing –DSM –Primitives n Easy to Use and Program Environment n Performance Study n Summary and Future Work

3 3 Parallel Processing: User Expectations n Affordable n Supercomputers for a poor man n Performance n Good performance n Ease of Use n Free from creation and placement concerns n Transparency n Unaware of location of processes n Ease of Programming n Choice and easy use of communication paradigm

4 4 Parallel Processing: Clusters n Clusters are an ideal platform for the execution of parallel applications n Many institutions (universities, banks, industries) move toward homogeneous non-dedicated clusters n Advantages: –Cheap to build: commodity PCs, networks –Widely available –Idle during weekends –Low utilization during working hours n Disadvantages: –Poor and difficult to use software (operating systems and runtime systems) –User unfriendly –Distribution of resources (CPUs and peripherals)

5 5 Parallel Processing Phases n Three distinct phases: –Initialization –Execution –Termination n Researchers and manufacturers mainly concentrate on execution to achieve the best performance n Ease of use of parallel systems and programmers time are neglected n Application developers are discouraged as they have to program many activities, which are of an operating system nature

6 6 Parallelism Management n Present operating systems that manage clusters are not built to support parallel processing n Reason: these operating systems do not provide services to manage parallelism n Parallelism management is the management of parallel processes and computational resources –Achieve high performance –Use computational resources efficiently –Make programming and use of parallel systems easy

7 7 Parallelism Management n Parallelism management in parallel programming tools, Distributed Shared Memory and enhanced operating system environments –has been neglected –left to the application developers n Application developers must deal –not only with parallel application development –but also with the problems of initiation and control for the execution on the cluster n Transparency and reliability (SSI) have been neglected – users do not see a cluster as a single powerful computer

8 8 Services for Parallelism Management on Clusters Services for parallelism management and transparency n Establishment of a virtual machine n Mapping of processes to computers n Parallel processes instantiation n Data (including shared) distribution n Initialisation of synchronization variables n Coordination of parallel processes n Dynamic load balancing

9 9 Transparency n Users should see a cluster as a single powerful computer n Dimensions of parallel processing transparency –Location transparency –Process relation transparency –Execution transparency –Device transparency

10 10 Communication Paradigms Two communication paradigms: n Message Passing (MP) Explicit communication between processes of a parallel application –Fast –Difficult to use for programmers n Distributed Shared Memory (DSM) Implicit communication between processes of a parallel application through shared memory objects –Easy to use –Demonstrates reduced performance n Claim: Operating environments that offer MP and DSM should be provided as a part of a cluster operating system as they manage system resources

11 11 What to do? n Affordable n Clusters n Performance n Introduce special services n Ease of Use n Parallelism management n Transparency n Operating systems n Ease of Programming n Message passing and DSM n Development of cluster operating systems supporting parallel processing n Services of cluster operating systems: –Distributed services for transparent communication and management of basic system resources –Services for parallelism management and transparency

12 12 Related Systems Message Passing Systems n PVM –A set of cooperating server processes and specialized libraries that support process communication, execution and synchronization –A virtual machine must be set up by the user –Provides transparent process creation and termination n MPI –Objective is to standardize and coordinate the direction of various message passing applications, tools and environments –Provides limited process management functions to support parallel processing n HARNESS –Does not provide transparency –Programmers are forced to specify computers, map processes to these computers –Load imbalance is neglected

13 13 Related Systems DSM Systems n Research concentrates mainly on improving performance n Ease of use has been neglected n Munin –Programmers must label different variables according to the consistency protocol they require –The initialisation stage requires the application developer to define the number of computers to be used –Programmers must create a thread on each computer, initialise shared data and create synchronization variables n TreadMarks –The application developer has a substantial input into initialisation of DSM processes –Full transparency is not provided

14 14 Related Systems Execution Environments n Improvement to PVM, MPI and DSM approach of running on top of an operating system is through the enhancement of an operating system to support parallel processing n Beowulf –Exploits distributed process space to manage parallel processes –Processes can be started on remote computers after logon operation into that computer was completed successfully –It does not address resource allocation nor load balancing –Transparent process migration is not provided

15 15 Related Systems Execution Environments n NOW –Combines specialized libraries and server processes with enhancement to the kernel –Enhancement: scheduling and communication kernel modules- GLUnix to provide network wide process, file and VM management –Parallelism management service: process initialisation on any cluster computer, support semi-transparent start of parallel processes on multiple nodes (how to select nodes?), barriers, MPI n MOSIX –Provides enhanced and transparent communication and scheduling within the kernel –Employs PVM to provide parallelism support (initial placement) –Process migration transparently migrates processes –Provides dynamic load balancing and data collection –Remote communication is handled through the originating computer

16 16 Related Systems Summary n All systems but MOSIX are based on middleware – there is no trial to develop a comprehensive operating system to support parallel processing on clusters n The solutions are performance driven – little work has been done on making them programmer friendly n Problems from parallel processing point of view: –Processes are created one at a time although primitives provided enable the user to create multiple processes –These systems (with the exception of MOSIX) do not provide complete transparency –Virtual machine is not set up automatically –These systems do not provide load balancing

17 17 Cluster Execution Environments Execution environments that support parallel processing on clusters can be developed using n Middleware approach – at the application level n Underware – at the kernel level

18 18 Middleware User process M PVM software Library functions or separate software Operating System (Unix) User process M DSM software Operating System (Unix) OR Application processes Operating system

19 19 Middleware - summary n Middleware allows programmers –to develop parallel application (PVM, MPI) –execute parallel applications on clusters (Beowulf) –employ shared memory based programming (Munin) –achieve good execution performance –take advantage of portability n Middleware –does not offer complete transparency –reduces potential execution performance (services are duplicated) –forces programmers to be involved in many time consuming and error prone activities that are of the operating system nature n Conclusion: to provide parallelism management, offer transparency, make programming and use of a system easy develop the needed services at the operating system level

20 20 Cluster operating systems n Cluster is a special kind of a distributed system n Cluster operating system supporting parallel processing should –possess the features of a distributed operating system to deal with distributed resources and their management and hide distribution –exploit additional services to manage parallelism for application and offer complete transparency –provide an enhanced programming environment n Three logical levels of a cluster operating system –Basic distributed operating system –Parallelism management and transparency system –Programming environment

21 21 Logical architecture of a cluster operating system Message Passing/PVM M PROGRAMMING ENVIRONMENT Communication Services DSM Services Parallelism Management System Enhanced Subset of a Distributed Operating System (Microkernel, Communication/File Management) Shared Memory

22 22 GENESIS Cluster Operating System n Proof of concept n Client-server model, microkernel approach and object based approach (all entities have names) n All basic resources: processor, main memory, network, interprocess communication, files are managed by relevant servers n IPC - Message passing services –basic communication paradigm –cornerstone of the architecture –provided by IPC Manager and local IPC component of microkernel n IPC placement and relationship with other services designed to achieve high performance and transparency n DSM provided by Space (memory) and IPC Managers

23 23 The GENESIS Architecture Parallel Processes Parallelism Management System Kernel Servers Global Scheduler RHODOS Microkernel DSM System Execution Manager IPC Manager Space Manager Process Manager File/Cache Manager Network Manager MP PVM DSM Resource Discovery Migration Manager

24 24 GENESIS Services for Parallelism Management and Transparency Basic services that provide parallelism management and offer transparency: n Establishment of a virtual machine n Process creation n Process duplication n Process migration n Global scheduling

25 25 Establishment of a Virtual Machine n Resource Discovery Server supports adaptive establishment of a virtual machine n Resource Discovery Server –Identifies n Idle and lightly loaded computers n Computer resources: e.g., processor model, memory size n Computational load and available memory n Communication patterns for each process –Passes information to the Global Scheduling Server per n Process n Server n Averaged over an entire cluster n Virtual machine changes dynamically n Some computers become overloaded or out of order n Some computers become idle

26 26 Process Creation n Requirements –Multiple process creation – to create many instances of a process on a single or over many computers –Scalability – must be scalable to many computers –Complete transparency – must hide the location of all resources and processes n Three forms of process creation: n Single n Multiple n Group n Creation is invoked when the Execution Manager receives a process create request from a parent process –Execution Manager notifies Global Scheduler –Global Scheduler sends location on which process should be created –Execution Manager on selected computer manages process creation

27 27 Process Creation Single and Multiple Services n Single process creation service –Similar to the services found in traditional systems supporting parallel processing –Requires executable image to be downloaded from disk for each parallel process to be created n Multiple process creation service –Supports the concurrent instantiation of a number of processes on a given computer through one creation call –When many computers are involved in multiple process creation, each computer is addressed in a sequential manner –Executable image of a parallel child process must be downloaded separately for each computer involved – scalability problem

28 28 Process Creation Group n Group process creation combines multiple process creation and group communication n Group process creation service –allows multiple process to be created concurrently on many computers –Single executable is downloaded from a file server using group communication

29 29 Group Process Creation Behavior File File Server Server Global GlobalScheduler Exec ExecManager Parent Child 1 Computer 1 Exec ExecManager Child 2 Computer 2 Exec ExecManager Child n Computer n

30 30 Process Duplication Single Local and Remote n Parallel processes are instantiated on selected computers by employing process duplication supported by process migration n Three forms of process duplication n Single local and remote n Multiple local and remote n Group remote n Single local and remote process duplication –Duplication is invoked when the Execution Manager receives a twin request from a parent process n Execution Manager notifies Global Scheduler n Global Scheduler sends a location on which twin should be placed n If this computer is remote process migration is employed

31 31 Process Duplication Multiple Local and Remote n Multiple local and remote process duplication is an enhancement of single process duplication n Duplication is invoked when the Execution Manager receives a multiple duplication request from a parent process –Execution Manager notifies Global Scheduler –Global Scheduler sends a location on which twin should be placed –If computer is local n Process Manager and Space Manager are requested to duplicate multiple copies of process entries and memory spaces –If computer is remote n the parent process is migrated to this destination n multiple copies of the parent process are duplicated n the parent process on the remote computer is killed n Child processes should be duplicated on many computers –Remote process duplication is performed for each selected computer

32 32 Process Duplication Group Remote n When more than one remote computer is involved in process duplication the overall performance decreases n Decrease is caused by migrating a parent process to each remote computer sequentially n Performance is improved by employing group process migration –Process Managers and Execution Managers each join a relevant group and use group communication –The parent process is concurrently migrated to all selected remote computers involved in process duplication

33 33 Group Remote Process Duplication Behavior Computer n Global GlobalScheduler Exec ExecManager Child 1 Parent Computer M Migration Manager Manager Exec ExecManager Child 2 Parent Computer 2 Migration Manager Manager Exec ExecManager Child n Parent Migration Manager Manager M 8 8

34 34 Process Migration n Designed to separate policy from mechanism –Process Migration Manager acts as the coordinator for migration of various resources that combine to form a process –Migration of resources: memory, process entries, buffers is carried out by the Space, Process and IPC Managers, respectively n Two forms of process migration: single and group n Single process migration –Global Scheduler provides which process to where computer –Local Manager requests its remote peer to prepare for a process –Local Migration Manager requests Space, Process and IPC Managers to migrate respective resources –Remote Manager informs its local peer of successful migration –Local Manager requests Space, Process and IPC Managers to delete the respective resources of the migrated process

35 35 Process Migration Behavior Process Process Manager Manager Global GlobalScheduler Migration Manager Manager Process Source Computer Space Space Manager Manager IPC IPC Manager Manager Process Process Manager Manager Migration Process Destination Computer Space Space Manager Manager IPC IPC Manager Manager Process State Spaces Spaces IPC Buffers Event

36 36 Group Process Migration n Enhancement of the single process migration n Modifying the single communication between the peer Migration Managers, Process Managers, Space Managers and IPC Managers to that of group communication n Global Scheduler provides which process to where computers –Each server migrates their respective resources to multiple destination computers in a single message using group communication –Parent process is duplicated on each remote computer –At the end of successful migration the parent process on each remote computer is killed

37 37 Global Scheduling n Makes policy decisions of which processes should be mapped to which computers n Input provided by the Resource Discovery Manager n Relies on mechanisms of –Single, multiple an group process creation and duplication services –Single and group process migration n The server combines services of –Static allocation – at the initial stage of parallel processing –Dynamic load balancing – to react to load fluctuations n Currently, the Global Scheduler is implemented as a centralized server

38 38 GENESIS Programming Interface n Designed and developed to provide both communication paradigms: –Message passing –Shared memory Message Passing/PVM M PROGRAMMING ENVIRONMENT Communication Services DSM Services Parallelism Management System Enhanced Subset of a Distributed Operating System (Microkernel, Communication/File Management) Shared Memory

39 39 Message Passing n Basic Message Passing –Exploits basic interprocess communication concepts –Transparent and reliable local and remote IPC –Integral component of GENESIS –Offers standard message passing and RPC primitives n GENESIS PVM –PVM added to provide a well known parallelism programming tool –Ported from the UNIX based PVM –Implemented within a library in GENESIS –Mapping of the standard PVM services onto the GENESIS services –Performance improvement of PVM on GENESIS n No additional classic PVM server processes required n Direct interprocess communication model instead of the default model n Load balancing provided

40 40 Architecture of PVM on Unix PVM Server Server User Task 1 libpvm libpvm Kernel Computer 1 PVM Server Server User Task 2 libpvm libpvm Kernel Computer 2 TCP Connections UDP Datagrams

41 41 Architecture of PVM on GENESIS ExecutionManagerMigrationManager GlobalScheduler IPCManagerNetworkManager ExecutionManagerMigrationManager IPCManagerNetworkManager Computer 1 Computer 2 PVMComms User PVM Parallel Processes Parallel Processes libpvm User PVM Parallel Processes Parallel Processes libpvm MicrokernelMicrokernel Network

42 42 Distributed Shared Memory n DSM is an integral component of the operating system n Since DSM is a memory management function the DSM system is integrated into the Space Manager –Shared memory used as though it were physically shared –Easy to use shared memory –Low overhead, improved performance n Two consistency models supported: –Sequential – implemented using invalidation model –Release – implemented using write-update model n Synchronization and coordination of processes –Semaphores - owned by Space Manager on particular computer –Gaining ownership is distributed and mutually exclusive –Barriers used for coordination – their management is centralized

43 43 Distributed Shared Memory IPCManager IPCManagerProcess Manager ManagerProcessManager DSMDSM Space SpaceManager Manager Manager User DSM User DSM Parallel Processes Parallel Processes User DSM User DSM Parallel Processes Parallel Processes Computer 1 Computer 2 MicrokernelMicrokernel Network Network Shared SharedMemory

44 44 GENESIS Primitives Execution n Two groups of primitives –to support execution services –for the provision of communication and coordination services

45 45 GENESIS Primitives Communication and Coordination MPPVMDSM send()pvm_send()read access recv()pvm_recv()write access pvm_pkbuf()wait() pvm_unpkbuf()signal() barrier() pvm_barrier() barrier()

46 46 Easy to Use and Program Environment GENESIS system n Provides and efficient and transparent environment for execution of parallel applications n Offers transparency n Relieves programmers from activities such as: –Selection of computers for a virtual a machine for the given application –Setting up a virtual machine –Mapping processes to virtual machine –Process instantiation using process creation and duplication supported by process migration –Load balancing

47 47 Easy to Use and Program Environment In the GENESIS system n Location of the remote computer(s) of the cluster is selected automatically by Global Scheduler n Users do not know process location n Programming of parallel applications has been made easy by providing –Message passing: standard and PVM –Distributed Shared Memory –Powerful primitives: implement sequences of operations and provide transparency process_ncreate(GROUP_CREATE,n, child_prog) –Process instantiation using process creation and duplication supported by process migration –Load balancing

48 48 Performance of Standard Parallel Applications n GENESIS System –13 Sun3/50 Workstations n 12 Computation + 1 File Server –10 Mbit/sec shared Ethernet n Influence of process instantiation on execution performance n GENESIS PVM vs. Unix PVM n Standard parallel applications –Successive Over Relaxation –Quicksort –Traveling Salesman Problem

49 49 Influence of Process Instantiation on Execution Performance Parallel Simulation (5, 25, 50 Second Workload) n Simulation - amount of work relates to the overall exec time n Two parameters: –Work load (5, 25, 50 Seconds) –Number of workstations (1..12) n Global scheduler & migration n Speedups for #comp = #proc

50 50 GENESIS PVM vs. Unix PVM IPC Latency n Support for IPC provided by the PVM server in Unix was substituted with GENESIS operating system mechanisms n To measure the time saved by removing the server, a simple PVM application that exchanges messages (1kbyte –100kbytes) was used n Round-trip time (including data packing and unpacking) was measured

51 51 GENESIS PVM vs. Unix PVM Speedup n Application used to study the influence of process instantiation - amount of work relates to the overall exec time – was studied n Parameters: –Number of workstations –GENESIS with and without load balancing

52 52 Successive Over Relaxation n Parallel applications developed based on algorithms of Rice University n Rice superior cluster hardware: DEC station-5000/240 + fast ATM net n For 8 computers – array size: Rice x 2048 elements with 101 iterations; GENESIS 128 x 128 elements with 10 iterations –DSM: TreadMarks – 6.3; GENESIS – 4.4 –PVM: Rice – 6.91; GENESIS – 5.1

53 53 Quicksort n Parallel applications developed based on algorithms of Rice n Rice superior cluster hardware: DEC station-5000/240 + fast ATM net n For 8 computers – array size: Rice x 1024 integers; GENESIS 256 x 256 integers –DSM: TreadMarks – 5.3; GENESIS – 2.5 –PVM: Rice – 6.79; GENESIS – 6.07

54 54 Traveling Salesman Problem n Parallel applications developed based on algorithms of Rice University n Rice superior cluster hardware: DEC station-5000/240 + fast ATM net n For 8 computers – 18 city tour; with the minimum threshold set to 13 cities –DSM: TreadMarks – 4.74; GENESIS – 6.33 –PVM: Rice – 5.63; GENESIS – 5.94

55 55 Summary n Nondedicated clusters are commonly available –Force application developers to program operating system operations –Do not offer transparency n Application developers need a computer system that –Processes applications efficiently –Uses cluster resources well –Allows to see cluster as a single powerful computer rather than as a set of connected computers n Proposal: employ a cluster operating system n Design: cluster operating system with three logical levels –Distributed operating system –Parallelism management and transparency system –Programming environment

56 56 Summary n GENESIS – designed and developed as a proof of concept n GENESIS is a system that satisfies user requirements n GENESIS approach is unique –Offers both message passing (MP and PVM) and DSM environment –Services providing parallelism management are integral components of an operating system –Provides a comprehensive environment to transparently manage system resources n Programmers do not have to be involved in parallelism management n Use of the cluster is has been made easy n Complete transparency is offered n Good performance results have been achieved

57 57 Future Work n Port GENESIS to an Intel like platform n Use virtual memory to support DSM n Offer reliable parallel computing services on clusters by employing –Reliable group communication –Checkpointing to offer fault tolerance


Download ppt "1 Making Parallel Processing on Clusters Efficient, Transparent and Easy for Programmers Andrzej M. Goscinski School of Computing and Mathematics Deakin."

Similar presentations


Ads by Google