Presentation is loading. Please wait.

Presentation is loading. Please wait.

Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison

Similar presentations


Presentation on theme: "Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison"— Presentation transcript:

1 Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison Email: condor-admin@cs.wisc.educondor-admin@cs.wisc.edu URL: http://www.cs.wisc.edu/condorhttp://www.cs.wisc.edu/condor

2 Condor Tutorial, NCSA Alliance '98, April 27th 1998 2 Welcome to the Condor Tutorial!  Introductions  What is Condor ? A system for High Throughput Computing

3 Condor Tutorial, NCSA Alliance '98, April 27th 1998 3 The “Religion” behind High Throughput Computing Key Concepts: High Throughput Computing (HTC) Distributively owned resources

4 Condor Tutorial, NCSA Alliance '98, April 27th 1998 4 Performance vs.Throughput  High Performance - Very large amounts of processing capacity over short time periods (FLOPS - Floating Point Operations Per Second)  High Throughput - Large amounts of processing capacity sustained over very long time periods (FLOPY - Floating Point Operations Per Year) FLOPY  30758400*FLOPS

5 Condor Tutorial, NCSA Alliance '98, April 27th 1998 5 Distributed Ownership  Due to dramatic decrease in the cost- performance ratio of hardware, powerful computing resources are owned today by individuals, groups, departments, … Huge increase in the aggregate processing capacity owned by the organization Much smaller increase in the capacity accessible by a single person

6 Condor Tutorial, NCSA Alliance '98, April 27th 1998 6 The Challenge and Motivation behind Condor Turn large collections of existing distributively owned (and perhaps non- dedicated) computing resources into effective High Throughput Computing Environments Minimize Wait while Idle

7 Condor Tutorial, NCSA Alliance '98, April 27th 1998 7 Road Block: Sociology Make owners (& system administrators) happy. Give owners full control on – –when and by whom private resources are used for HTC – –impact of HTC on private Quality of Service – –membership and information on HTC related activities No changes to existing software and make it easy – – to install, configure, monitor, and maintain Happy owners  more resources  higher throughput

8 Condor Tutorial, NCSA Alliance '98, April 27th 1998 8 Road Block: Robustness To be effective, a HTC environment must run as a 24-7-365 operation. Customers count on it Debugging and fault isolation may be a very time consuming processes In a large distributed system, everything that might go wrong will go wrong. Robust system  less down time  higher throughput

9 Condor Tutorial, NCSA Alliance '98, April 27th 1998 9 Road Block: Portability To be effective, the HTC software must run on and support the latest greatest hardware and software. Owners select hardware and software according to their needs and tradeoffs Customers expect it to be there. Application developer expect only few (if any) changes to their applications. Portability  more platforms  higher throughput

10 Condor Tutorial, NCSA Alliance '98, April 27th 1998 10 Condor’s unique mechanisms for HTC  Matchmaking - enables requests for services and offers to provide services to find each other.  Checkpointing - enables preemptive resume scheduling ( go ahead and use it as long as it is available!).  Remote I/O - enables remote (from execution site) access to local (at submission site) data.

11 Condor Tutorial, NCSA Alliance '98, April 27th 1998 11 Condor Viewpoints  Owner Creates resource offers  User Creates resource requests  Administrator Drinks Coffee Manages the pool-wide configuration Could also be the Owner

12 Condor Tutorial, NCSA Alliance '98, April 27th 1998 12 Condor Agents  Condor Resource Agent condor_startd daemon allows a machine to execute Condor jobs enforces owner policy  Condor User Agent condor_schedd daemon allows a machine to submit jobs to a pool

13 Condor Tutorial, NCSA Alliance '98, April 27th 1998 13 schedd Your Workstation The Tutorial Installation CentralManager Alliance ‘98 Pool startd

14 Condor Tutorial, NCSA Alliance '98, April 27th 1998 14 The Tutorial Installation CentralManager CentralManager Alliance ‘98 Pool UW-Madison Pool schedd schedd Your Workstation startd

15 Condor Tutorial, NCSA Alliance '98, April 27th 1998 15 Hands-on: Example #1 Joining the UW-Madison CS Condor Pool as a Submit-only node

16 Condor Tutorial, NCSA Alliance '98, April 27th 1998 16 Overview of Submitting a Job to Condor  Create a Submit-Description File  Run condor_compile to relink your program with the Condor Libraries, if Condor’s Checkpointing or Remote I/O support is desired  Run condor_submit sends your request to the User Agent (condor_schedd)

17 Condor Tutorial, NCSA Alliance '98, April 27th 1998 17 Condor System Structure

18 Condor Tutorial, NCSA Alliance '98, April 27th 1998 18 Hands-on: Example #2 Submit Jobs to Condor

19 Condor Tutorial, NCSA Alliance '98, April 27th 1998 19 Condor Universes A Universe specifies a Condor runtime environment: STANDARD –Supports Checkpointing –Supports Remote System Calls –Has some limitations…. VANILLA –Any Unix executable (shell scripts, etc) –No Condor Checkpointing or Remote I/O

20 Condor Tutorial, NCSA Alliance '98, April 27th 1998 20 Hands-on: Example #3 Tour of User Tools/Commands

21 Condor Tutorial, NCSA Alliance '98, April 27th 1998 21 User Priorities in Condor  Each active user in the pool has a user priority  Viewed or changed with condor_userprio  Like golf: the lower, the better  A given user’s share of available machines is inversely related to the ratio between user priorities. Example: Fred’s priority is 10, Joe’s is 20. Fred will be allocated twice as many machines as Joe.

22 Condor Tutorial, NCSA Alliance '98, April 27th 1998 22 User Priorities in Condor, cont.  Condor continuously adjusts user priorities over time machines allocated > priority, priority worsens machines allocated < priority, priority improves  Priority Preemption Higher priority users will grab machines away from lower priority users (thanks to Checkpointing…) Starvation is prevented Priority “thrashing” is prevented

23 Condor Tutorial, NCSA Alliance '98, April 27th 1998 23 Parallel Jobs in Condor Condor can run parallel applications ( written to the popular PVM message passing library )

24 Condor Tutorial, NCSA Alliance '98, April 27th 1998 24 Master-Worker Paradigm Condor-PVM is designed to run PVM applications which follow the master-worker paradigm.  Master has a pool of work, sends pieces of work to the workers, manages the work and the workers  Worker gets a piece of work, does the computation, sends the result back

25 Condor Tutorial, NCSA Alliance '98, April 27th 1998 25 What does Condor-PVM do? Condor acts as the PVM resource manager.  All pvm_addhost requests get re-mapped to Condor. Condor dynamically constructs PVM virtual machines out of non-dedicated desktop machines.  When a machine leaves the pool, the user gets notified via the normal PVM notification mechanisms.

26 Condor Tutorial, NCSA Alliance '98, April 27th 1998 26 How to compile and submit Condor-PVM jobs  Binary Compatible Compile and link with PVM library just as normal PVM applications. No need to link with Condor.  Submit In the submit file set: universe = PVM machine_count =..

27 Condor Tutorial, NCSA Alliance '98, April 27th 1998 27 Classified Advertisements  ClassAds Language for expressing attributes Semantics for evaluating them  Intuitively, a ClassAd is a set of named expressions Each named expression is an attribute  Expressions are similar to C … Constants, attribute references, operators

28 Condor Tutorial, NCSA Alliance '98, April 27th 1998 28 Classified Advertisements: Example MyType = "Machine" TargetType = "Job" Name = "froth.cs.wisc.edu" StartdIpAddr=" " Arch = "INTEL" OpSys = "SOLARIS251" VirtualMemory = 225312 Disk = 35957 KFlops = 21058 Mips = 103 LoadAvg = 0.011719 KeyboardIdle = 12 Cpus = 1 Memory = 128 Requirements = LoadAvg 15 * 60 Rank = 0

29 Condor Tutorial, NCSA Alliance '98, April 27th 1998 29 Classified Advertisements: Matching  ClassAds are always considered in pairs Does ClassAd A match ClassAd B (and vice versa)?

30 Condor Tutorial, NCSA Alliance '98, April 27th 1998 30 Classified Advertisements: Examples  ClassAd A MyType = "Apartment" TargetType = "ApartmentRenter" SquareArea = 3500 RentOffer = 1000 HeatIncluded = False OnBusLine = True Rank = UnderGrad==False + TARGET.RentOffer Requirements = MY.RentOffer - TARGET.RentOffer < 150  ClassAd B MyType = "ApartmentRenter" TargetType = "Apartment" UnderGrad = False RentOffer = 900 Rank = 1/(TARGET.RentOffer + 100.0) + 50*HeatIncluded Requirements = OnBusLine && SquareArea > 2700

31 Condor Tutorial, NCSA Alliance '98, April 27th 1998 31 ClassAds in the Condor System  ClassAds allow Condor to be a general system Constraints and ranks on matches expressed by entities themselves Only priority logic integrated into Manager  All principal entities in the Condor system are represented by ClassAds Machines, Jobs, Submitters

32 Condor Tutorial, NCSA Alliance '98, April 27th 1998 32 ClassAds in Condor: Requirements and Rank (Example) Friend = Owner == "tannenba" || Owner == "wright" ResearchGroup = Owner == "jbasney" || Owner == "raman" Trusted = Owner != "rival" && Owner != "riffraff" Requirements = Trusted && ( ResearchGroup || LoadAvg 15*60 ) Rank = Friend + ResearchGroup*10

33 Condor Tutorial, NCSA Alliance '98, April 27th 1998 33 Hands-on: Example #4 Submit Jobs with ClassAd Constraints

34 Condor Tutorial, NCSA Alliance '98, April 27th 1998 34 Resource Owner’s Viewpoint  In Condor, the owner of the resource (machine owner) can dictate the terms and conditions under which that resource can be used  How? Configure the Resource Agent’s Policy (condor_startd configuration)

35 Condor Tutorial, NCSA Alliance '98, April 27th 1998 35 Resource Agent Configuration Expressions  START expression When TRUE, Condor can start a job –True = Unclaimed State –False = Owner State  SUSPEND expression When TRUE, Condor suspends any job running on this machine  CONTINUE expression When TRUE, will continue a suspended job

36 Condor Tutorial, NCSA Alliance '98, April 27th 1998 36 Resource Agent Configuration Expressions, cont.  VACATE expression When TRUE, kick the job off of the machine (via a Checkpoint if possible)  KILL expression When TRUE, kill the job immediately –No Checkpoint –On UNIX: a “kill -9”

37 Condor Tutorial, NCSA Alliance '98, April 27th 1998 37 Resource Agent Configuration Expressions, Cont. STARTSTART WANT SUSPEND SUSPENDSUSPEND VACATEVACATE WANT VACATE KILLKILL True False

38 Condor Tutorial, NCSA Alliance '98, April 27th 1998 38 Resource Agent Configuration Expressions, cont.  Default Setup WANT_VACATE : True WANT_SUSPEND : True START : Keyboard_Idle && CPU_Idle SUSPEND : Keyboard_Busy || CPU_Busy CONTINUE : Keyboard and CPU idle again VACATE : If Suspended > 10 minutes KILL : If spent > 10 minutes in VACATE state

39 Condor Tutorial, NCSA Alliance '98, April 27th 1998 39 Hands-on: Example #5 UW-Madison CS Pool Startd Policy

40 Condor Tutorial, NCSA Alliance '98, April 27th 1998 40 Condor Administrator Features  The condor_master is the administrator’s best friend Watches/restarts other daemons Sends Email if notices suspicious problems Runs condor_preen Provides administrator remote control

41 Condor Tutorial, NCSA Alliance '98, April 27th 1998 41 Condor Administrator Commands  Administrator Commands condor_off [ hostname … ] –Down entire pool: condor_off `cat machines-file` condor_on condor_restart condor_reconfig (“on-the-fly” reconfiguration) condor_vacate  These commands could be used by the Owner as well, if desired

42 Condor Tutorial, NCSA Alliance '98, April 27th 1998 42 Condor Host-based Access Control  HOST_ALLOW and HOST_DENY to grant machines (subnets, domains) different access levels: READ access WRITE access ADMINISTRATOR access OWNER access

43 Condor Tutorial, NCSA Alliance '98, April 27th 1998 43 Example: Simple Host-based Access Control HOSTDENY_READ = *.mil HOSTALLOW_WRITE = *.ncsa.uiuc.edu HOSTDENY_WRITE = ppp*.ncsa.uiuc.edu, 172.44.* HOSTALLOW_ADMINISTRATOR = bigcheese.ncsa.uiuc.edu HOSTALLOW_OWNER = $(FULL_HOSTNAME), $(HOSTALLOW_ADMINISTRATOR)

44 Condor Tutorial, NCSA Alliance '98, April 27th 1998 44 Configuration File Hierarchy  condor_config Pool-wide default Condor pool administrator’s requirements  condor_config.local Overrides for a specific machine Reflects Owner’s requirements  condor_config.root System Administrator requirements

45 Condor Tutorial, NCSA Alliance '98, April 27th 1998 45 Future Directions  Condor for Windows NT  SMP support  More parallel job support Checkpoint parallel jobs MPI, MPI-2  Flocking …

46 Condor Tutorial, NCSA Alliance '98, April 27th 1998 46 Obtaining Condor  Condor can be downloaded from the Condor web site at: http://www.cs.wisc.edu/condor  Complete Users and Administrators manual available http://www.cs.wisc.edu/condor/manual  Contracted Support is available  Questions? Email : condor-admin@cs.wisc.edu

47 Condor Tutorial, NCSA Alliance '98, April 27th 1998 47 Thank You!! Thank you for your interest! The Condor Team: Miron Livny Marvin Solomon Todd Tannenbaum Derek Wright Bin Song Rajesh Raman Tom Stanis Jim Basney Adiel Yoaz


Download ppt "Condor Tutorial NCSA Alliance ‘98 Presented by: The Condor Team University of Wisconsin-Madison"

Similar presentations


Ads by Google