Presentation on theme: "Greg Thain Computer Sciences Department University of Wisconsin-Madison Condor Parallel Universe."— Presentation transcript:
Greg Thain Computer Sciences Department University of Wisconsin-Madison firstname.lastname@example.org http://www.cs.wisc.edu/condor Condor Parallel Universe
www.cs.wisc.edu/condor Overview › Task vs. Job Parallelism › New Condor support for Task- Parallelism › Other goodies
www.cs.wisc.edu/condor The Talk in one Slide Parallel Universe can run any * task parallel job Not just MPICH 1.2.4 Not just MPI…
www.cs.wisc.edu/condor Job vs Task Parallelism › Condor historically focused on Job Parallelism › Job parallelism either manually or via DAGman › Rest of talk on task parallelism › Can also get task parallel via pvm or MW
www.cs.wisc.edu/condor Parallel Universe › Adaptation of MPI universe › Modifications based on experience with MPI › User feedback › But, more than just MPI
www.cs.wisc.edu/condor Scheduling › Need “Dedicated Scheduler” "Dedicated" has a specific Condor meaning Nodes running MPI require a dedicated scheduler A Given machine can have many opportunistic schedulers ... but only 1 dedicated scheduler
www.cs.wisc.edu/condor DedicatedScheduler surprises › DedicatedScheduler co-opts normal negotiation cycle › Preemption and scheduling work differently than opportunistic › DedicatedScheduler schedules First- Fit, sorted by UserJobPrio › Condor_q –analyze mystery!
www.cs.wisc.edu/condor Job startup › Same file transfer, etc. as Vanilla › One shadow, many starters › Starter runs sshd on all machines, does key exchange › Starter runs the exe on first machine (head node, Rank0)
www.cs.wisc.edu/condor Your script Here › Script on the head node has contact file › We provide samples for LAM, MPICH › We try to mimic “by hand” startup › Use condor_ssh to start remote jobs › When script exits, condor cleans up
Your consent to our cookies if you continue to use this website.