Presentation is loading. Please wait.

Presentation is loading. Please wait.

Batch Software at JLAB Ian Bird Jefferson Lab CHEP2000 7-11 February, 2000.

Similar presentations


Presentation on theme: "Batch Software at JLAB Ian Bird Jefferson Lab CHEP2000 7-11 February, 2000."— Presentation transcript:

1 Batch Software at JLAB Ian Bird Jefferson Lab CHEP2000 7-11 February, 2000

2 Ian Bird / Jefferson LabCHEP 20002 Introduction Environment –Farms –Data flows –Software Batch systems –JLAB software –LSF vs. PBS Scheduler Tape software –File pre-staging/caching

3 Ian Bird / Jefferson LabCHEP 20003 Environment Computing facilities were designed to: –Handle data rate of close to 1 TB/day –1 st level reconstruction only (2 passes) Match average data rate –Some local analysis but mainly export of vastly reduced summary DSTs Originally estimated requirements: –~ 1000 SI95 –3 TB online disk –300 TB tape storage – 8 RedWood drives

4 Ian Bird / Jefferson LabCHEP 20004 Environment - real After 1 year of production running of CLAS (largest experiment) –Detector is far cleaner than anticipated, which means: Data volume is less ~ 500 GB/day Data rate is 2.5x anticipated (2.5 kHz) Fraction of good events larger DST sizes are same as Raw data (!) –Per event processing time is much longer than original estimates –Most analysis is done locally – no-one is really interested in huge data exports Other experiments also have large data rates (for short periods)

5 Ian Bird / Jefferson LabCHEP 20005 Computing implications CPU requirement is far greater –Current farm is 2650 SI95 and will double this year Farm has a big mixture of work –Not all production – “small” analysis jobs too –We make heavy use of LSF hierarchical scheduling Data access demands are enormous –DSTs are huge, many people, frequent accesses –Analysis jobs want many files Tape access became a bottleneck –Farm can no longer be satisfied

6 JLab Farm Layout Gigabit Ethernet Quad SUN E4000 STK Redwood Tape Drives Fast Ethernet Gigabit Ethernet SCSI2 FWD SCSI2 UWD/S Fast Ethernet Dual PII 450MHz Qty. 20 18GB UWS Dual PII 400MHz Qty. 20 18GB UWS Cisco Cat 5500 Quad SUN E3000 400GB stage 150GB 200GB stage MetaStor SH7400 File Server 3TB UWD work MetaStor SH7400 File Server 3TB UWD work Plan - FY 2000 STK 9840 Tape Drives FARM SYSTEMS MASS STORAGE MASS STORAGESERVERS WORK FILE SERVERS WORK FILE SERVERS Cisco 2900 Gigabit Ethernet Cisco 2900 Dual PIII 500MHz Qty. 25 18GB UWS Dual PIII 650MHz Qty. 25 18GB UWS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD CACHE FILE SERVERS CACHE FILE SERVERS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD Cisco 2900 Dual PII 300MHz Qty. 10 18GB FWD

7 Ian Bird / Jefferson LabCHEP 20007 Other farms Batch farm –180 nodes -> 250 Lattice QCD –20 node Alpha (Linux) cluster –Parallel application development –Plans (proposal) for large 256 node cluster Part of larger collaboration Group want a “meta-facility” –Jobs run on least loaded cluster (wide area scheduling)

8 Ian Bird / Jefferson LabCHEP 20008 Additional requirements Ability to handle and schedule parallel jobs (MPI) Allow collaborators to “clone” the batch systems and software –Allow inter-site job submission –LQCD is particularly interested in this Remote data access

9 Ian Bird / Jefferson LabCHEP 20009 Components Batch software –Interface to underlying batch system Tape software –Interface to OSM, overcome limitations Data caching strategies –Tape staging –Data caching –File servers

10 Ian Bird / Jefferson LabCHEP 200010 Batch software A layer over the batch management system –Allow replacement of batch system LSF, PBS (DQS) –Constant user interface no matter what the underlying system is –Batch farm can be managed by the management system (e.g. LSF) –Build in a security infrastructure (e.g GSI) Particularly to allow remote access securely

11 Batch control system LSF, PBS, DQS, etc. Job submission system Submission interface Database Query interface User processes Submission, query, statistics Batch processors Batch system - schematic

12 Ian Bird / Jefferson LabCHEP 200012 Existing batch software Has been running for 2 years –Uses LSF –Multiple jobs – parameterized jobs (LSF now has job arrays, PBS does not have this) –Client is trivial to install on any machine with a JRE – no need to install LSF, PBS etc. Eases licensing issues Simple software distribution Remote access –Standardized statistics and bookkeeping outside of LSF MySQL based

13 Ian Bird / Jefferson LabCHEP 200013 Existing software cont. Farm can be managed by LSF –Queues, hosts, scheduler etc. Rewrite in progress to: –Add PBS interface (and DQS?) –Security infrastructure to permit authenticated remote access –Clean up

14 Ian Bird / Jefferson LabCHEP 200014 PBS as alternative to LSF PBS (Portable Batch System – NASA) –Actively developed –Open, freely available –Handles MPI (PVM) –User interface very familiar to NQS/DQS users –Problem (for us) was the (lack of a good) scheduler PBS provides only a trivial scheduler, but Provides mechanism to plug in another We were using hierarchical scheduling in LSF

15 Ian Bird / Jefferson LabCHEP 200015 PBS scheduler Multiple stages (6), can be used or not as required, in arbitrary order –Match making – matches requirements to system resources –System priority (e.g. data available) –Queue selection (which queue runs next) –User priority –User share: which user runs next, based on user and group allocations and usage –Job age Scheduler has been provided to PBS developers for comments – and is under test

16 Ian Bird / Jefferson LabCHEP 200016 Mass storage Silo – 300 TB Redwood capacity –8 Redwood drives –5 (+5) 9840 drives –Managed by OSM Bottleneck: –Limited to a single data mover –That node has no capacity for more drives –1 TB tape staging RAID disk 5 TB of NFS work areas/caching space

17 Ian Bird / Jefferson LabCHEP 200017 Solving tape access problems Add new drives – 9840’s –Requires 2 nd OSM instance Transparent to user Eventual replacement of OSM Transparent to user File pre-staging to the farm Distributed data caching (not NFS) Tools to allow user optimization Charge for (prioritize) mounts

18 Ian Bird / Jefferson LabCHEP 200018 OSM OSM has several limitations (and is no longer supported) –Single mover node is most serious No replacement possible yet Local tapeserver software solves many of these problems for us –Simple remote clients (Java based) – do not need OSM except on server

19 Ian Bird / Jefferson LabCHEP 200019 Tape access software Simple put/get interface, –Handles multiple files, directories etc. Can have several OSM instances, but a unique file catalog, transparent to user –System fails over between servers Only way to bring 9840’s on line Data transfer is network (socket) copy in Java Allows a scheduling/user allocation algorithm to be added to tape access Will permit “transparent” replacement of OSM

20 Ian Bird / Jefferson LabCHEP 200020 Data pre-fetching & caching Currently –Tape – stage disk – network copy to farm node local disk –Tape – stage disk – NFS cache – farm But this can cause NFS server problems Plan: –Dual solaris nodes with ~ 350 GB disk (RAID 0) Gigabit ethernet Provides large cache for farm input –Stage out entire tapes to cache Cheaper than staging space, better performance than NSF Scaleable as the farm grows

21 JLab Farm Layout Gigabit Ethernet Quad SUN E4000 STK Redwood Tape Drives Fast Ethernet Gigabit Ethernet SCSI2 FWD SCSI2 UWD/S Fast Ethernet Dual PII 450MHz Qty. 20 18GB UWS Dual PII 400MHz Qty. 20 18GB UWS Cisco Cat 5500 Quad SUN E3000 400GB stage 150GB 200GB stage MetaStor SH7400 File Server 3TB UWD work MetaStor SH7400 File Server 3TB UWD work Plan - FY 2000 STK 9840 Tape Drives FARM SYSTEMS MASS STORAGE MASS STORAGESERVERS WORK FILE SERVERS WORK FILE SERVERS Cisco 2900 Gigabit Ethernet Cisco 2900 Dual PIII 500MHz Qty. 25 18GB UWS Dual PIII 650MHz Qty. 25 18GB UWS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD CACHE FILE SERVERS CACHE FILE SERVERS Dual Sun Ultra2 400GB UWD Dual Sun Ultra2 400GB UWD Cisco 2900 Dual PII 300MHz Qty. 10 18GB FWD

22 Ian Bird / Jefferson LabCHEP 200022 File pre-staging Scheduling for pre-staging is done by the job server software –Splits/groups jobs by tape (could be done by user) –Makes a single tape request –Holds jobs while files are staged –Implemented by batch jobs that release held jobs –Released jobs with data available get high priority –Reduces job slots blocked by jobs waiting for data

23 Ian Bird / Jefferson LabCHEP 200023 Conclusions PBS is a sophisticated and viable alternative to LSF Interface layer permits –use of same jobs on different systems – user migration –Add features to batch system


Download ppt "Batch Software at JLAB Ian Bird Jefferson Lab CHEP2000 7-11 February, 2000."

Similar presentations


Ads by Google