Presentation is loading. Please wait.

Presentation is loading. Please wait.

HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National.

Similar presentations


Presentation on theme: "HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National."— Presentation transcript:

1 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility HEPiX Spring 2005 Batch Scheduling at JLab Sandra Philpott Scientific Computing Manager Physics Computer Center

2 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Overview of Resources Experimental Physics Batch Farm + Mass Storage Raw Data Storage Data Replay and Analysis 200 dual Xeons http://auger.jlab.org/scicomp Theoretical Physics High Performance Computing (HPC) - Lattice QCD 3 clusters of meshed machines 384 GigE, 256 GigE, 128 Myrinet parallel jobs http://www.jlab.org/hpc

3 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Schedulers LSF – Physics Offline Reconstruction and Analysis Auger locally developed front end Tight integration with JASMine, our mass storage system Consider PBS in time for Hall D and GlueX? Cost savings, compatibility with HPC jsub user command OpenPBS – Lattice QCD parallel computing Torque UnderLord locally developed scheduler Also provides trend analysis, projections, graphs, etc. Considering Maui as a replacement for UnderLord qsub user command

4 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Queue Configuration - LSF Production – bulk of the jobs Priority – quick jobs – less than 30 min. Low priority – intended for simulations Idle – typically mprime Maintenance – for SysAdmin QueuePriorityPolicyPreemptTime OptTime Priority100FIFONoYes30 Production80FairshareNo - Low Priority5Fairshare /RR NoYes2880 Idle1FIFOYesNo

5 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Queue Configuration - PBS Batch Queue Names: 2m: Master@qcd2madm 3g: Master@qcd3gadm 4g: Panel01@qcd4gadm, Panel23@qcd4gadm, Panel45@qcd4gadm Queue & Machine Limits: 2m: 24 hours, 8 GB /scratch, 256 MB memory 3g: 24 hours, 20 GB /scratch, 256 MB memory 4g: 24 hours, 20 GB /scratch, 512 MB memory Jobs that use the most nodes have the highest priority UnderLord scheduling policy defined by Admin Job Age, Job Duration, Queue Priority, User Share, User Priority

6 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Sample Script – LSF JOBNAME: job2 PROJECT: clas COMMAND: /home/user/test/job2.script OPTIONS: -debug OS: solaris INPUT_FILES: /mss/clas/raw/run1001.data /mss/clas/raw/run1002.data /mss/clas/raw/run1003.data INPUT_DATA: fort.11 OTHER_FILES: /home/xxx/yyy/exp.database TOTAPE OUTPUT_DATA: recon.run100 OUTPUT_TEMPLATE: /mss/clas/prod1/OUTPUT_DATA

7 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Sample Script – PBS #! /bin/csh -f setenv DEPEND "" if ($#argv > 0) then setenv DEPEND "-W depend=afterok" foreach job ($argv) setenv DEPEND "${DEPEND}:$job" end endif qsub \ -c n \ -m ae -M akers@jlab.org\ -l nodes=64:ppn=2,walltime=30\ -v SLEEPTIME=30\ -N MPI_CPI_Test \ -p 1 \ -q Master@qcdadm01 ${DEPEND}\ /home/akers/TestJobs/MPI/cpi.cshcpi.csh

8 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Resource Utilization ExpPhy Efficient Data Flow - prestaging of data before jobs are admitted to farm Spread data over multiple file servers transparently Keeps batch farm CPUs 100% busy; no waiting on data to arrive Workaround to specify specific resources to imply newer systems with more memory DISK_SPACE: 125 GB HPC/Lattice jobs may have an optimal resource spec but can use other configs if optimal not available

9 HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National Accelerator Facility Summary We would like Common job submission for users for both experimental and LQCD jobs For both experimental and LQCD clusters For grid jobs common set of resource descriptors; user can specify only the ones required We are collaborating with STAR at BNL for RDL – Request Description Language http://www.star.bnl.gov/STAR/comp/Grid/scheduler/rdl/index.html We will soon become an Open Science Grid site http://www.opensciencegrid.org


Download ppt "HEPiX Karlsruhe May 9-13, 2005 Operated by the Southeastern Universities Research Association for the U.S. Department of Energy Thomas Jefferson National."

Similar presentations


Ads by Google