Presentation is loading. Please wait.

Presentation is loading. Please wait.

NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing.

Similar presentations


Presentation on theme: "NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing."— Presentation transcript:

1 NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing

2 Introduction… Windows desktop systems at IUB student labs –2300 systems, 3 year replacement cycle –Pentium IV (>=1.6 GHz), 256/512/1024 MB memory, 10/100 Mbps/GigE, Windows XP –More than 1.5 TF NSF Site Visit 2-23-2006

3 Possibly Utilize Idle Cycles? Red: total owner Blue: total idle Green: total Condor NSF Site Visit 2-23-2006

4 Problem Description Once again... Windows desktop systems at IUB student labs: – As a scientific resource – Harvest idle cycles NSF Site Visit 2-23-2006

5 Constraints Systems dedicated to students using desktop office applications — not parallel scientific computing – making their availability unpredictable and sporadic Microsoft Windows environment Daily software rebuild (updates) NSF Site Visit 2-23-2006

6 What could these systems be used for? Many small computations and a few small messages –Foreman-worker –Parameter studies –Monte Carlo Goal: High Throughput Computing (not HPC) –Parallel runs of the aforementioned small computations to make better use of resource –Parallel libraries – MPI, PVM, etc. – have constraints if availability of resources is ephemeral i.e. not predictable NSF Site Visit 2-23-2006

7 Solution Simple Message Brokering Library (SMBL) –Limited replacement for MPI Both server and client library based on TCP socket abstraction –Porting from MPI is fairly straight forward Process and Port Manager (PPM) Plus … –Condor for job management, file transfer, no checkpointing or parallelism –Web portal for job submission NSF Site Visit 2-23-2006

8 The Big Picture We’ll discuss each part in more detail next… The shaded box indicates components hosted on multiple desktop computers NSF Site Visit 2-23-2006

9 SMBL (Server) SMBL server maintains a dynamic pool of client process connections Worker job manager hides details of ephemeral workers at the application level SMBL RankCondor Assigned Node 0 (Foreman) Wrubel Computing Center, sacramento 1Chemistry Student Lab, computer_14 2CS Student Lab, computer_8 3Library, computer_6 SMBL Server Process Table for 4 CPU parallel session NSF Site Visit 2-23-2006

10 SMBL (Server) SMBL server maintains a dynamic pool of client process connections Worker job manager hides details of ephemeral workers at the application level SMBL RankCondor Assigned Node 0 (Foreman) Wrubel Computing Center, sacramento 1Chemistry Student Lab, computer_14 2Physics Student Lab, computer_11 3Library, computer_6 SMBL Server Process Table for 4 CPU parallel session NSF Site Visit 2-23-2006

11 SMBL (Client) Client library implements selected MPI-like calls –MPI_Send ()  SMBL_Send () –MPI_Recv ()  SMBL_Recv () In charge of message delivery for each parallel process NSF Site Visit 2-23-2006

12 Process and Port Manager (PPM) Starts the SMBL server and application processes on demand Assigns port/host to each parallel session Directs workers to their servers NSF Site Visit 2-23-2006

13 PPM with two SMBL servers (two parallel sessions) SMBL RankCondor Assigned Node 0 (Foreman)Wrubel Computing Center, sacramento 1Chemistry Student Lab, computer_14 2CS Student Lab, computer_8 3Wells Library, computer_6 0 (Foreman)Wrubel Computing Center, sacramento 1Wells Library, computer_27 2Biology Student Lab, computer_4 3CS Student Lab, computer_2 PPM (cont’d...) NSF Site Visit 2-23-2006 Parallel Session 1 Parallel Session 2

14 Once again … the big picture The shaded box indicates components hosted on multiple desktop computers NSF Site Visit 2-23-2006

15 Recent Development Hydra cluster Teragrid enabled! (Nov 2005) –Allow TG users to use resource –Virtual Host based solution – two different URLs for IU and Teragrid users –Teragrid users authenticate against PSC’s Kerberos server NSF Site Visit 2-23-2006

16 PPM, SMBL server, Condor and web portal running on Linux server –Dual Intel Xeon 3.0 GHz, 4 GB memory, GigE Second Linux server running Samba to serve BLAST database System Layout NSF Site Visit 2-23-2006

17 Portal Creates and submits Condor files, handles data files Apache/PHP based Kerberos authentication URLs: –http://hydra.indiana.edu (IU users)http://hydra.indiana.edu –http://hydra.iu.teragrid.org (Teragrid users)http://hydra.iu.teragrid.org NSF Site Visit 2-23-2006

18 Utilization of Idle Cycles Red: total owner Blue: total idle Green: total Condor NSF Site Visit 2-23-2006

19 Summary Large parallel computing facility created at a low cost –SMBL parallel message passing library that can deal with ephemeral resources –PPM port broker that can handle multiple parallel sessions SMBL Homepage –http://smbl.sourceforge.net (Open Source)http://smbl.sourceforge.net NSF Site Visit 2-23-2006

20 Links and References Hydra Portal –http://hydra.indiana.edu (IU users)http://hydra.indiana.edu –http://hydra.iu.teragrid.org (Teragrid users)http://hydra.iu.teragrid.org SMBL home page: http://smbl.sourceforge.nethttp://smbl.sourceforge.net Condor home page: http://www.cs.wisc.edu/condor/ http://www.cs.wisc.edu/condor/ IU Teragrid home page – http://iu.teragrid.orghttp://iu.teragrid.org NSF Site Visit 2-23-2006

21 Links and References (cont’d..) Parallel FastDNAml: http://www.indiana.edu/~rac/hpc/fastDNAml http://www.indiana.edu/~rac/hpc/fastDNAml Blast: http://www.ncbi.nlm.nih.gov/BLASThttp://www.ncbi.nlm.nih.gov/BLAST Meme: http://meme.sdsc.edu/meme/intro.htmlhttp://meme.sdsc.edu/meme/intro.html NSF Site Visit 2-23-2006


Download ppt "NSF Site Visit 2-23-2006 HYDRA Using Windows Desktop Systems in Distributed Parallel Computing."

Similar presentations


Ads by Google