Presentation is loading. Please wait.

Presentation is loading. Please wait.

Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor.

Similar presentations


Presentation on theme: "Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor."— Presentation transcript:

1 Zach Miller Computer Sciences Department University of Wisconsin-Madison zmiller@cs.wisc.edu http://www.cs.wisc.edu/condor What’s New in Condor

2 www.cs.wisc.edu/condor Overview › Condor Development Process  Stable vs. Development › New Features in 6.6.0 › Significant improvements which are covered in other talks:  What’s new in Condor-G covered by Todd Tannenbaum  Hawkeye covered by Nick LeRoy  COD (Computing On Demand) covered by Derek Wright  Packaging and Testing covered by Alain Roy

3 www.cs.wisc.edu/condor Condor Development Process › We maintain two different releases at all times  Stable Series Second digit is even: e.g. 6.2.2, 6.4.7, 6.6.0  Development Series Second digit is odd: e.g. 6.3.1, 6.5.2

4 www.cs.wisc.edu/condor Stable Series › Heavily tested › Runs on our production pool of nearly 1,000 CPUs › No new features, only bugfixes, are allowed into a stable series › A given stable release is always compatible with other releases from the same series  6.4.X is compatible with 6.4.Y › Recommended for production pools

5 www.cs.wisc.edu/condor Development Series › Less heavily tested › Runs on our small(er) test pool. › New features and new technology are added frequently › Versions from the same development series are not always compatible with each other

6 www.cs.wisc.edu/condor Overview of New Features  Windows  DAGMan  Better Security  Central Manager  Improved Negotiation  Black Holes  New Utilities  Smarter File Transfer  Submit-time file staging  New Installer  ClassAd improvements  And More!!

7 www.cs.wisc.edu/condor Improvements in Condor for Windows › Ability to run SCHEDULER universe jobs  DAGMan  Any executable or batch file › JAVA universe support  JVM provided by execution site  Better error management  Ability to use CHIRP (Remote I/O)

8 www.cs.wisc.edu/condor Improvements in Condor for Windows (cont) › New Support for:  Windows XP  Foreign Language versions of Windows  Legacy 16-bit app › Improved Windows-to-UNIX job submission and vice versa. › BirdWatcher, a system tray icon which gives basic status and control of Condor

9 www.cs.wisc.edu/condor New Features in DAGMan › DAGMan previously required that all jobs share one log file › Each job can now have it’s own log file › Understands XML userlogs › Can produce.dot file graphs

10 www.cs.wisc.edu/condor Better Security › GSI (X.509 Certificates) implementation more complete and customizable  Each Condor daemon can have its own certificate  You can run a “Personal Condor” with your user proxy › Easier configuration  If you already have Globus installed, very little additional configuration of Condor is necessary to start using X.509 certificates for authentication › Improved error messages if something goes wrong  Tells you if the problems was network, authentication, or authorization related

11 www.cs.wisc.edu/condor Central Manager New Features › Keeps statistics on missed updates › Can use TCP instead of UDP, if you must › Redundant central managers can be running with the SECONDARY_COLLECTOR_LIST parameter  If the main central manager goes down, you may still run administrative commands › Central Manager daemons can now run on any port  COLLECTOR_HOST = condor.cs.wisc.edu:9019  NEGOTIATOR_HOST = condor.cs.wisc.edu:9020

12 www.cs.wisc.edu/condor Improved Negotiation › Allows the condor_schedd (the job queue manager) to send “classes” of jobs to the Negotiator for matching › Previously, jobs were sent one at a time. › Now, 1000 of the same job will take the same time to negotiate as 100, 10 or just one job › Currently, job classes are defined in the condor_config file. Very soon, they will be automatically determined…  “Buckets” will be needed

13 www.cs.wisc.edu/condor Avoiding Black Holes › Condor can keep track of the last N resource matches › This can be used to prefer the same machine if restarted › Can also be used to avoid a machine if restarted, which is a first step towards avoiding “Black Holes” – machines that consume jobs but always fail to run them

14 www.cs.wisc.edu/condor New Utilites › ‘condor_q –held’ gives you a list of held jobs and the reason they were put on hold › ‘condor_config_val –config’ tells you where (file and line number) an attribute is defined › ‘condor_rm –f’ will forcefully remove a job, which is particularily useful when the globus jobmanager is not cooperating › ‘condor_fetch_log’ will grab a log file from a remote machine:  condor_fetch_log c2-15.cs.wisc.edu STARTD

15 www.cs.wisc.edu/condor Smarter File Transfer › New file transfer mechanism:  ShouldTransferFiles = YES | NO | IF_NEEDED  YES : Always transfer files to execution site  NO : Rely on a shared filesystem  IF_NEEDED : will automatically transfer the files if the submit and execute machine are not in the same FileSystemDomain › Very useful for cross-platform submitting and also for flocking

16 www.cs.wisc.edu/condor Submit-Time File Staging › When submitting a job, you can tell Condor to create a “sandbox” of all necessary input files with ‘condor_submit –s’ › After completion, job can stay in queue with ‘leave_in_queue’ expression › Output files are then fetched manually

17 www.cs.wisc.edu/condor New Installer › For Windows  Based on MSI (Microsoft Software Installer)  Batch Install option › For UNIX  Version 6.6.0 will be available in RPMs  Command line options specify the installation parameters, and no questions are asked  Easier to automate

18 www.cs.wisc.edu/condor ClassAds › ClassAd attributes can be dynamically linked to external functions  Example: [ label = “uptime” value = some_func_that_calls_uptime() ]

19 www.cs.wisc.edu/condor Misc New Features › Jobs can be submitted via GRAM (the Globus Gatekeeper) › Daemons do not have to run as ‘root’ or ‘condor’ to have multiple different users submitting › Rudimentary load balancing between checkpoint servers by picking one randomly from a list › More job policy expressions  PERIODIC_RELEASE  GLOBUS_RESUBMIT

20 www.cs.wisc.edu/condor Conclusion › Todd Tannenbaum will tell you about the roadmap for future work › Questions?


Download ppt "Zach Miller Computer Sciences Department University of Wisconsin-Madison What’s New in Condor."

Similar presentations


Ads by Google