Presentation is loading. Please wait.

Presentation is loading. Please wait.

Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services.

Similar presentations


Presentation on theme: "Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services."— Presentation transcript:

1 Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services Indiana University

2 Problem Description Turn Windows desktop systems in STC labs (around 2000) into a parallel scientific computer

3 Discussion Parallel applications need coordination through message passing MPI does not handle ephemeral processes well Multiplexing communication among processes Ports brokered among multiple parallel sessions

4 What do we have ? Condor NT, vanilla universe match-making, file transfer, fair sharing, job submission, suspension, preemption, restart, security Test application – fastDNAml-p Parallel application, master-worker model, small granularity of work

5 How we did it Simple Message Brokering Library (SMBL) Process and Port Manager (PPM) A mechanism for users to submit jobs (web portal)

6 SMBL An IO multiplexing server in charge of message delivery for each parallel session (serialize communication) SMBL client library implements selected MPI-like calls Both the server and the client library are based on a TCP socket abstraction library

7 Process and Port Manager Assigns port to each of the SMBL server process start the SMBL server and application processes on demand direct workers to their servers

8 The Portal Apache based PHP web interface Creates and submits the condor submit files

9 The Big Picture The shaded box indicates components hosted on multiple desktop computers

10 Statistics Red: total owner Blue: total idle Green: total Condor

11 Scalability Issues Needed big server Adjusted condor_config MAX_JOBS_RUNNING 1000 SHADOW_SIZE_ESTIMATE 900KB MAX_STARTD_LOG 640KB Lost workers because of per-process file descriptor limit (1024)

12 Summary Built a large parallel scientific computing facility using Condor Built parallel message passing library to deal with ephemeral resources Built port broker to handle multiple parallel sessions Built web portal It is open source, visit: http://smbl.sourceforge.net


Download ppt "Parallel Computing using Condor on Windows PCs Peng Wang and Corey Shields Research and Academic Computing Division University Information Technology Services."

Similar presentations


Ads by Google