Presentation is loading. Please wait.

Presentation is loading. Please wait.

Peter Couvares Computer Sciences Department University of Wisconsin-Madison High-Throughput Computing With.

Similar presentations


Presentation on theme: "Peter Couvares Computer Sciences Department University of Wisconsin-Madison High-Throughput Computing With."— Presentation transcript:

1 Peter Couvares Computer Sciences Department University of Wisconsin-Madison pfc@cs.wisc.edu http://www.cs.wisc.edu/~pfc High-Throughput Computing With Condor

2 www.cs.wisc.edu/condor Who Are We?

3 www.cs.wisc.edu/condor The Condor Project (Established ‘85) Distributed systems CS research performed by a team that faces:  software engineering challenges in a Unix/Linux/NT environment,  active interaction with users and collaborators,  daily maintenance and support challenges of a distributed production environment,  and educating and training students. Funding - NSF, NASA,DoE, DoD, IBM, INTEL, Microsoft and the UW Graduate School.

4 www.cs.wisc.edu/condor The Condor System

5 www.cs.wisc.edu/condor The Condor System › Unix and NT › Operational since 1986 › More than 1300 CPUs at UW-Madison › Available on the web › More than 150 clusters worldwide in academia and industry

6 www.cs.wisc.edu/condor What is Condor? › Condor converts collections of distributively owned workstations and dedicated clusters into a high- throughput computing facility. › Condor uses matchmaking to make sure that everyone is happy.

7 www.cs.wisc.edu/condor What is High-Throughput Computing? › High-performance: CPU cycles/second under ideal circumstances.  “How fast can I run simulation X on this machine?” › High-throughput: CPU cycles/day (week, month, year?) under non-ideal circumstances.  “How many times can I run simulation X in the next month using all available machines?”

8 www.cs.wisc.edu/condor What is High-Throughput Computing? › Condor does whatever it takes to run your jobs, even if some machines…  Crash! (or are disconnected)  Run out of disk space  Don’t have your software installed  Are frequently needed by others  Are far away & admin’ed by someone else

9 www.cs.wisc.edu/condor What is Matchmaking? › Condor uses Matchmaking to make sure that work gets done within the constraints of both users and owners. › Users (jobs) have constraints:  “I need an Alpha with 256 MB RAM” › Owners (machines) have constraints:  “Only run jobs when I am away from my desk and never run jobs owned by Bob.”

10 www.cs.wisc.edu/condor “What can Condor do for me?” Condor can… › …do your housekeeping. › …improve reliability. › …give performance feedback. › …increase your throughput!

11 www.cs.wisc.edu/condor Some Numbers: UW-CS Pool 6/98-6/00 4,000,000hours ~450 years “Real” Users1,700,000hours ~260 years CS-Optimization610,000hours CS-Architecture350,000hours Physics245,000hours Statistics80,000hours Engine Research Center38,000hours Math90,000hours Civil Engineering27,000hours Business970hours “External” Users165,000hours ~19 years MIT76,000hours Cornell38,000hours UCSD38,000hours CalTech18,000hours

12 www.cs.wisc.edu/condor Condor & Physics

13 www.cs.wisc.edu/condor Current CMS Activity › Simulation (CMSIM) for CalTech  provided >135,000 CPU hours to date  peak day ~ 4000 CPU hours  via NCSA Alliance, Condor has allocated 1,000,000 hours total to CalTech › Simulation and Reconstruction (CMSIM + ORCA) for HEP group at UW-Madison

14 www.cs.wisc.edu/condor INFN Condor Pool - Italy › Italian National Institute for Research in Nuclear and Subnuclear Physics › 19 locations, each running a Condor pool › as few as 1 CPU -- to >100 CPUs › each locally controlled › each “flocks” jobs to other pools when available

15 www.cs.wisc.edu/condor Particle Physics Data Grid › The PPDG Project is...  a software engineering effort to design, implement, experiment, evaluate, and prototype HEP-specific data-transfer and caching software tools for Grid environments › For example...

16 www.cs.wisc.edu/condor Condor PPDG Work › Condor Data Manager  technology to automate & coordinate data movement from a variety of long- term repositories to available Condor computing resources & back again  keeping the pipeline full!  SRB (SDSC), SAM (Fermi), PPDG HRM

17 www.cs.wisc.edu/condor PPDG Collaborators

18 www.cs.wisc.edu/condor National Grid Efforts › GriPhyN (Grid Physics Network) › National Technology Grid - NCSA Alliance (NSF-PACI) › Information Power Grid - IPG (NASA) › close collaboration with the Globus project

19 www.cs.wisc.edu/condor I have 600 simulations to run. How can Condor help me?

20 www.cs.wisc.edu/condor My Application … Simulate the behavior of F(x,y,z) for 20 values of x, 10 values of y and 3 values of z (20*10*3 = 600)  F takes on the average 3 hours to compute on a “typical” workstation ( total = 1800 hours )  F requires a “moderate” (128MB) amount of memory  F performs “moderate” I/O - (x,y,z) is 5 MB and F(x,y,z) is 50 MB

21 www.cs.wisc.edu/condor Step I - get organized! › Write a script that creates 600 input files for each of the (x,y,z) combinations › Write a script that will collect the data from the 600 output files › Turn your workstation into a “ Personal Condor ” › Submit a cluster of 600 jobs to your personal Condor › Go on a long vacation … (2.5 months)

22 www.cs.wisc.edu/condor your workstation personal Condor 600 Condor jobs

23 www.cs.wisc.edu/condor Step II - build your personal Grid › Install Condor on the desktop machine next door › …and on the machines in the classroom. › Install Condor on the department’s Linux cluster or the O2K in the basement. › Configure these machines to be part of your Condor pool. › Go on a shorter vacation...

24 www.cs.wisc.edu/condor your workstation personal Condor 600 Condor jobs Group Condor

25 www.cs.wisc.edu/condor Step III - take advantage of your friends › Get permission from “friendly” Condor pools to access their resources › Configure your personal Condor to “flock” to these pools › reconsider your vacation plans...

26 www.cs.wisc.edu/condor your workstation friendly Condor personal Condor 600 Condor jobs Group Condor

27 www.cs.wisc.edu/condor Think BIG. Go to the Grid.

28 www.cs.wisc.edu/condor Upgrade to Condor-G A Grid-enabled version of Condor that uses the inter-domain services of Globus to bring Grid resources into the domain of your Personal Condor  Easy to use on different platforms  Robust  Supports SMPs & dedicated schedulers

29 www.cs.wisc.edu/condor Step IV - Go for the Grid › Get access (account(s) + certificate(s)) to a “Computational” Grid › Submit 599 “Grid Universe” Condor- glide-in jobs to your personal Condor › Take the rest of the afternoon off...

30 www.cs.wisc.edu/condor your workstation friendly Condor personal Condor 600 Condor jobs Globus Grid PBS LSF Condor Group Condor 599 glide-ins

31 www.cs.wisc.edu/condor What Have We Done with the Grid Already? › NUG30  quadratic assignment problem  30 facilities, 30 locations minimize cost of transferring materials between them  posed in 1968 as challenge, long unsolved  but with a good pruning algorithm & high-throughput computing...

32 www.cs.wisc.edu/condor NUG30 Personal Condor Grid For the run we will be flocking to -- the main Condor pool at Wisconsin (600 processors) -- the Condor pool at Georgia Tech (190 Linux boxes) -- the Condor pool at UNM (40 processors) -- the Condor pool at Columbia (16 processors) -- the Condor pool at Northwestern (12 processors) -- the Condor pool at NCSA (65 processors) -- the Condor pool at INFN (200 processors) We will be using glide_in to access the Origin 2000 (through LSF ) at NCSA. We will use "hobble_in" to access the Chiba City Linux cluster and Origin 2000 here at Argonne.

33 www.cs.wisc.edu/condor NUG30 - Solved!!! Sender: goux@dantec.ece.nwu.edu Subject: Re: Let the festivities begin. Hi dear Condor Team, you all have been amazing. NUG30 required 10.9 years of Condor Time. In just seven days ! More stats tomorrow !!! We are off celebrating ! condor rules ! cheers, JP.

34 www.cs.wisc.edu/condor Conclusion Computing power is everywhere, we try to make it usable by anyone.

35 www.cs.wisc.edu/condor Need more info? › Condor Web Page (http://www.cs.wisc.edu/condor) › Peter Couvares (pfc@cs.wisc.edu)


Download ppt "Peter Couvares Computer Sciences Department University of Wisconsin-Madison High-Throughput Computing With."

Similar presentations


Ads by Google