Presentation is loading. Please wait.

Presentation is loading. Please wait.

The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,

Similar presentations


Presentation on theme: "The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,"— Presentation transcript:

1 The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, A. Chan, R. Hogue, C. Hollowell, O. Rind, J. Smith, T. Throwe, T. Wlodek, D. Yu J. Smith, T. Throwe, T. Wlodek, D. Yu RHIC Computing Facility RHIC Computing Facility Brookhaven National Laboratory Brookhaven National Laboratory

2 Outline BackgroundBackground HardwareHardware SoftwareSoftware SecuritySecurity GRID-like capabilitiesGRID-like capabilities Near-term plansNear-term plans

3 Background Used for mass processing of RHIC dataUsed for mass processing of RHIC data U.S. tier 1 Center for ATLASU.S. tier 1 Center for ATLAS Listed as 3 rd largest cluster in http://clusters.top500.orgListed as 3 rd largest cluster in http://clusters.top500.org Currently staffed with 5 FTECurrently staffed with 5 FTE

4 Growth of the Linux Farm

5 Hardware Hardware Built with commercially available Intel- based serversBuilt with commercially available Intel- based servers 1097 rack-mounted, dual CPU servers1097 rack-mounted, dual CPU servers 917,728 SpecInt2000917,728 SpecInt2000 Reliable (0.0052 hardware failure/machine- month—about 6 failures/month)Reliable (0.0052 hardware failure/machine- month—about 6 failures/month)

6 Breakdown of Hardware Failures

7 The Linux Farm in the RCF

8 The IBM servers

9 The VA Linux servers

10 Hardware (cont.) BrandCPURAMStorageQuantity VA Linux 450 MHz 0.5-1 GB 9-120 GB 154 VA Linux 700 MHz 0.5 GB 9-36 GB 48 VA Linux 800 MHz 0.5-1 GB 18-480 GB 168 IBM 1.0 GHz 0.5-1 GB 18-144 GB 315 IBM 1.4 GHz 1 GB 36-144 GB 160 IBM 2.4 GHz 1 GB 240 GB 252

11 Software Custom version of Red Hat Linux 7.2Custom version of Red Hat Linux 7.2 Linux image installed with KickstartLinux image installed with Kickstart Support for an array of compilers (gcc, PGI, Intel) and debuggers (gdb, TotalView, Intel)Support for an array of compilers (gcc, PGI, Intel) and debuggers (gdb, TotalView, Intel) Support for network file systems (AFS, NFS)Support for network file systems (AFS, NFS)

12 Software (cont.) Support for LSF and MDS-compatible batch softwareSupport for LSF and MDS-compatible batch software Mix of open-source, RCF-built and vendor- supplied system to monitor and control hardware, software and infrastructureMix of open-source, RCF-built and vendor- supplied system to monitor and control hardware, software and infrastructure Cluster management tools based on open- source softwareCluster management tools based on open- source software

13 Linux Farm Monitoring

14 Batch Control & Monitoring

15 Linux Farm Usage

16 Remote Power Management

17 Infrastructure Monitoring

18 Security Firewall to minimize unauthorized accessFirewall to minimize unauthorized access User access via SSH through security- enhanced gateway systemsUser access via SSH through security- enhanced gateway systems Most servers closed to direct external accessMost servers closed to direct external access Other security measures being developedOther security measures being developed

19 Security (cont.)

20 GRID-like capabilities Ganglia (monitoring & job scheduler)Ganglia (monitoring & job scheduler) Condor (batch software)Condor (batch software) GLOBUS & LSF batchGLOBUS & LSF batch

21 Ganglia Open-source monitoring software (http://sourceforge.net/projects/ganglia)Open-source monitoring software (http://sourceforge.net/projects/ganglia) Can create federation of clustersCan create federation of clusters Historical data informationHistorical data information Can be used as job scheduler in GRID-like environmentCan be used as job scheduler in GRID-like environment

22 Ganglia (cont.) Web interfaceWeb interface Prototype at the RCFPrototype at the RCF Scalability issuesScalability issues Downside – cannot (yet) restrict data access easily, not easily customizedDownside – cannot (yet) restrict data access easily, not easily customized

23 Ganglia at the RCF

24 Ganglia at the RCF (1)

25 Ganglia at the RCF (2)

26 Condor Open-source software (http://www.cs.wisc.edu/condor)Open-source software (http://www.cs.wisc.edu/condor) Supported by Univ. of WisconsinSupported by Univ. of Wisconsin Full-feature batch software with job- queuing mechanism, scheduling policy, priority scheme, checkpoint capability, resource monitoring & managementFull-feature batch software with job- queuing mechanism, scheduling policy, priority scheme, checkpoint capability, resource monitoring & management Can connect together multiple remote clustersCan connect together multiple remote clusters

27 Condor (cont.) Interface with GLOBUS via Condor-GInterface with GLOBUS via Condor-G Prototype for Linux Farm batch access in GRID-like environmentPrototype for Linux Farm batch access in GRID-like environment Scalability -- not yet tested in very large- scale environment?Scalability -- not yet tested in very large- scale environment? MDS-compatible in RCF environment? MDS-compatible in RCF environment?

28 Condor & the batch software

29 GLOBUS & LSF GLOBUS tools allow remote users to submit jobs on local clusterGLOBUS tools allow remote users to submit jobs on local cluster ATLAS prototype at the RCFATLAS prototype at the RCF Gatekeeper acts as interface between GLOBUS and local batch systemGatekeeper acts as interface between GLOBUS and local batch system LSF job submitted from gatekeeperLSF job submitted from gatekeeper

30 GRID & LSF (cont.)

31 GLOBUS & LSF

32 Near-term plans Use mature version of ganglia for monitoring. Use as job scheduler?Use mature version of ganglia for monitoring. Use as job scheduler? Roll out Condor as part of new batch softwareRoll out Condor as part of new batch software Upgrade to LSF v. 5.x – GRID-like featuresUpgrade to LSF v. 5.x – GRID-like features Other GRID-like capabilities?Other GRID-like capabilities? Security issuesSecurity issues


Download ppt "The GRID and the Linux Farm at the RCF CHEP 2003 – San Diego CHEP 2003 – San Diego March 27, 2003 March 27, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind,"

Similar presentations


Ads by Google