Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech.

Similar presentations

Presentation on theme: "Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech."— Presentation transcript:

1 Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech

2 General Strategy Approx 200 Windows 2000 Desktop PCs with Exceed used to access central Linux systems Digital Unix and VMS phased out for general use. Red Hat Linux 7.3 is becoming the standard

3 Network Access Campus Backbone Router Super Janet 4 2.4Gb/s with Super Janet 4 OUCS Firewall depts Physics Firewall Physics Backbone Router 100Mb/s 1Gb/s 100Mb/s 1Gb/s Backbone Edge Router depts 100Mb/s depts 100Mb/s Backbone Edge Router 1Gb/s

4 Physics Backbone Upgrade to Gigabit Autumn 2002 desktop Server Gb/s switch Physics Firewall Physics Backbone Router 1Gb/s 100Mb/s Particle Physics desktop 100Mb/s 1Gb/s 100Mb/s Clarendon Lab 1Gb/s Linux Server Win 2k Server Astro 1Gb/s Theory 1Gb/s Atmos 1Gb/s

5 pplx1morpheuspplxfs1pplxgen pplx2 1Gb/s ppcresst1ppcresst2 ppatlas1atlassbc ppminos1ppminos2 gridpplxbatch pptb01 pptb02 Grid Development pplx3 (SNO) ppnt117 (HARP) CDF minos DAQ Atlas DAQ cresst DAQ General Purpose Systems tblcfgtbse01tbce01 RH 7.3 Fermi 7.3.1 RH 7.3 RH 7.1 RH 7.3 RH 6.2 RH 7.1 RH 7.3 RH 6.2 PBS Batch Farm Autumn 2002 4*Dual 2.4GHz systems RH 7.3 edg ui sam testing Autumn 2002

6 pplxfs1pplxgen pplx2 1Gb/s General Purpose Systems RH 7.3 RH 6.2 PBS Batch Farm Autumn 2002 4*Dual 2.4GHz systems RH 7.3

7 Zero - D X- 3i SCSI -IDE RAID 12 * 160GB Maxtor Drives Supplied by Compusys This proved to be a disaster and was rejected in favour of bare scsi disks which we internally mounted in our rack mounted file server

8 The Linux File Server: pplxfs1 8*146GB SCSI disks

9 General Purpose Linux Server : pplxgen pplxgen is a Dual 2.2GHz Pentium 4 Xeon based system with 2GB ram. It is running Red Hat 7.3 It was brought on line at the end of August 2002 to share the load with pplx2 as users migrated off al1 (the Digital Unix Server)

10 PP batch farm running Red Hat 7.3 with Open PBS can be seen below pplxgen This service became fully operational in Feb 2003.

11 pplx1 (new) morpheus1Gb/s gridpplxbatch pptb01 pptb02 Grid Development CDF tblcfgtbse01tbce01 Fermi 7.3.1 RH 7.1 RH 6.2 edg ui sam testing matrix Fermi 7.3.1 node9 Fermi 7.3.1 cdfsam Fermi 7.3.1 node1 Fermi 7.3.1 RH 6.1 RH 7.3 tbwn01tbwn02 RH 6.2 tbgen01 FEBRUARY 2003 LHCB MC RH 6.2

12 Grid development systems. Including EDG software testbed setup.

13 New Linux Systems Morpheus is an IBM x370 8 way SMP 700MHz Xeon with 4GB RAM and 1TB Fibre Channel disks Installed August 2001 Purchased as part of a JIF grant for the cdf group Runs Red Hat 7.1 Will use cdf software developed at Fermilab and here to process data from the cdf experiment.

14 Tape Backup is provided by a Qualstar TLS4480 tape robot with 80 slots and Dual Sony AIT3 drives. Each tape can hold 100GB of data. Installed January 2002. Netvault Software from BakBone is used, running on morpheus, for backup of both cdf and particle physics systems.

15 Second round of cdf JIF tender: Dell Cluster - MATRIX 10 Dual 2.4GHz P4 Xeon servers running Fermi linux 7.3.1 and SCALI cluster software. Installed December 2002

16 Approx 7.5 TB for SCSI RAID 5 disks are attached to the master node. Each shelf holds 14 146GB disks. These are shared via NFS with the worker nodes. OpenPBS batch queuing software is used.

17 Plenty of space in the second rack for expansion of the cluster.

18 Lhcb Monte Carlo Setup 8 way 700MHz Xeon Server RH6.2 OpenAFS OpenPBS grid RH6.2 Globus1.1.3 OpenAFS OpenPBS Compute Node Grid Gateway The 8 way SMP has now been reloaded as a MS Windows Terminal Server and lhcb MC jobs will be run on the new pp farm.

19 Problems IDE Raid proved to be unreliable, caused lots of down time. Problems with NAT (using iptables caused NFS problems and hangs) Solved by dropping NAT and using real IP addresses for PP farm Trouble with ext3 journal errors. Hackers…

20 Problems Lack of Manpower! Number of Operating systems slowly reducing, Digital unix and vms very nearly gone. NT4 also practically eliminated. Getting closer to standardising on RH 7.3 especially as the EDG software is now heading that way. Still finding it very hard to support laptops but now have a standard clone and recommend IBM laptops. Would be good to have more time to concentrate on security…. (See later talk)

Download ppt "Oxford PP Computing Site Report HEPSYSMAN 28 th April 2003 Pete Gronbech."

Similar presentations

Ads by Google