Presentation on theme: "Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre."— Presentation transcript:
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre
2 Computing Resources Dedicated 16-node Linux cluster (thor) 250+ workstations in undergraduate labs 200+ workstations for research, PhD and support staff –Athlon 1.4Ghz – 3.0Ghz P4s, 512MB-1GB Well-provisioned Extreme networking infrastructure –100Mbit full duplex to the desk, 1Gbit fibre backbone with 2 Black Diamond core routers
3 Operating Environment Standardized Windows and Linux managed installations –Nearly every machine has a Linux install –Windows only installed on a subset of desktops –Automated configuration, software installation and updates Shared automounted /home and /vol filesystems –Small number of central NFS fileservers –Numerous /vol areas provided for individual research groups –Includes /vol/condor to support Condor activity No firewalls deployed within departmental netblock –Firewalls exist between the pool hosts and the outside world, but internally have unrestricted access.
4 Original Motivation for Condor An experiment! Lots of capable workstations idle for substantial portions of the day Wanted to be able to make better use of resources Condor an ideal framework –Simple to set up –Freely available –Low maintenance
5 Condor Configuration Operated in a cycle-stealing mode. –Only dedicated machine is an old Athlon workstation running condor_negotiator and condor_collector daemons Primary concern is to not impinge upon users main work –By all means use up any spare CPU cycles, but get out of the way when the user returns.
6 Production users Now have a number of high-throughput users: –Bioinformatics Evaluating protein-protein interaction network evolution models –Visual Information Processing Non-rigid registrations of 3D infant brain MR images –London e-Science Centre GENIE: Grid ENabled Integrated Earth system model –Teaching Part of Grid Computing course tutorial work
7 Recent statistics Start of term (main lab back online) Nightly reboot New desktops get Condor switched on Overnight maintenance
8 Perceived Benefits Makes better utilization of otherwise unused resources Frees up compute time on production cluster hardware Reduces the barrier to entry to obtaining access to large quantities of CPU time
9 Issues User detection currently not fully functional…! –Recent Linux kernel revisions dont behave as Condor expects –When a user logs in through X11 without opening a terminal, doesnt get noticed by Condor. –Fix being developed. Sometimes consuming disk resources to exhaustion –Low-tech solution – ask users not to generate large quantities of output.. Source code availability? –Condor effectively already managed as an open source project –Source would have been helpful when diagnosing fault (Documentation, however, is excellent.)
10 Comparison with Sun Grid Engine SGE used on LeSC dedicated high- performance clusters Different fundamental design philosophy: –SGE uses a central, static configuration –Condor designed to function well with a floating pool Has some features Condor lacks: –Greater control over queuing policy –SGE 6.0 provides advanced reservation capability –Source code readily available
11 Conclusions Consider the experiment to be very successful Has become essential to the work of others in the department and College at large Very satisfied with the quality of the implementation and documentation