The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.

Slides:



Advertisements
Similar presentations
The RHIC-ATLAS Computing Facility at BNL HEPIX – Edinburgh May 24-28, 2004 Tony Chan RHIC Computing Facility Brookhaven National Laboratory.
Advertisements

Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH Home server AFS using openafs 3 DB servers. Web server AFS Mail Server.
Computing Infrastructure
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
CIT 470: Advanced Network and System AdministrationSlide #1 CIT 470: Advanced Network and System Administration Servers.
Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
SLAC Site Report Chuck Boeheim Assistant Director SLAC Computer Services.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
1 CIS450/IMSE450/ECE478 Operating Systems Winter 2003 Professor Jinhua Guo.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
CT NIKHEF June File server CT system support.
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Gareth Smith RAL PPD HEP Sysman. April 2003 RAL Particle Physics Department Site Report.
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
RHUL1 Site Report Royal Holloway Sukhbir Johal Simon George Barry Green.
The Mass Storage System at JLAB - Today and Tomorrow Andy Kowalski.
Chapter 4 COB 204. What do you need to know about hardware? 
SLAC National Accelerator Laboratory Site Report A National Lab in Transition Randy Melen, Deputy CIO Computing Division, Operations Directorate SLAC National.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
27/04/05Sabah Salih Particle Physics Group The School of Physics and Astronomy The University of Manchester
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.
Design & Management of the JLAB Farms Ian Bird, Jefferson Lab May 24, 2001 FNAL LCCWS.
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
Notes from Installing a Mac G5 Cluster at SLAC Chuck Boeheim SLAC Computing Services.
Paul Scherrer Institut 5232 Villigen PSI HEPIX_AMST / / BJ95 PAUL SCHERRER INSTITUT THE PAUL SCHERRER INSTITUTE Swiss Light Source (SLS) Particle accelerator.
Business Intelligence Appliance Powerful pay as you grow BI solutions with Engineered Systems.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility HEPiX – Fall, 2005.
Progress Energy Corporate Data Center Rob Robertson February 17, 2010 of North Carolina.
Site Report GSI ● GSI ● GSI ● Linux-Farm Walter Schön, GSI.
Computing for LHCb-Italy Domenico Galli, Umberto Marconi and Vincenzo Vagnoni Genève, January 17, 2001.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
Computer Systems Lab The University of Wisconsin - Madison Department of Computer Sciences Linux Clusters David Thompson
Laboratório de Instrumentação e Física Experimental de Partículas GRID Activities at LIP Jorge Gomes - (LIP Computer Centre)
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
JLAB Computing Facilities Development Ian Bird Jefferson Lab 2 November 2001.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
Jefferson Lab Site Report Sandy Philpott Thomas Jefferson National Accelerator Facility Jefferson Ave. Newport News, Virginia USA 23606
PHENIX Computing Center in Japan (CC-J) Takashi Ichihara (RIKEN and RIKEN BNL Research Center ) Presented on 08/02/2000 at CHEP2000 conference, Padova,
CASPUR Site Report Andrei Maslennikov Lead - Systems Amsterdam, May 2003.
26/4/2001LAL Site Report - HEPix - LAL 2001 LAL Site Report HEPix – LAL Apr Michel Jouvin
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
HEP Computing Status Sheffield University Matt Robinson Paul Hodgson Andrew Beresford.
HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.
November 1, 2000HEPiX/HEPNT FERMI SITE REPORT Lisa Giacchetti.
International Workshop on HEP Data Grid Aug 23, 2003, KNU Status of Data Storage, Network, Clustering in SKKU CDF group Intae Yu*, Joong Seok Chae Department.
Randy MelenApril 14, Stanford Linear Accelerator Center Site Report April 1999 Randy Melen SLAC Computing Services/Systems HPC Team Leader.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
January 30, 2016 RHIC/USATLAS Computing Facility Overview Dantong Yu Brookhaven National Lab.
LBT Q Eng/SW Review CIN/NIN -Computer and Network Infrastructure.
Batch Software at JLAB Ian Bird Jefferson Lab CHEP February, 2000.
Western Tier 2 Site at SLAC Wei Yang US ATLAS Tier 2 Workshop Harvard University August 17-18, 2006.
LBNL/NERSC/PDSF Site Report for HEPiX Catania, Italy April 17, 2002 by Cary Whitney
Oct. 6, 1999PHENIX Comp. Mtg.1 CC-J: Progress, Prospects and PBS Shin’ya Sawada (KEK) For CCJ-WG.
10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.
Patrick Gartung 1 CMS 101 Mar 2007 Introduction to the User Analysis Facility (UAF) Patrick Gartung - Fermilab.
Jefferson Lab Site Report Kelvin Edwards Thomas Jefferson National Accelerator Facility Newport News, Virginia USA
Linux Systems Administration 101 National Computer Institute Sep
UK GridPP Tier-1/A Centre at CLRC
Manchester HEP group Network, Servers, Desktop, Laptops, and What Sabah Has Been Doing Sabah Salih.
Presentation transcript:

The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services

Components Solaris Farm 900 single CPU units Linux Farm512 dual CPU units AFS7 servers, 3 TB NFS21 servers, 16 TB Objectivity94 servers, 52 TB LSFMaster, backup, license HPSSMaster + 10 tape movers Interactive25 servers, + E10000 Build Farm12 servers Network9 Cisco 6509 switches

Staffing System Admin7 Mass Storage3 Applications3 Batch1 Operations4 Operators0 Same staff supports most Unix desktops on site

Growth in Systems

Growth in Staffing

Ratio of Systems/Staff

Physical Racking, power, cooling, seismic, network Remote power management Remote console management Installation Burn-in, DOAs Maintenance Replacement burn-in Divergence from original models Locating a machine

Networking Gb to servers 100Mb to farm nodes Speed matching (problems) at switches Network glitches and storms Network monitoring

System Admin Network install (256 machines in < 1 hr) Patch management Power Up/Down Nightly maintenance System Ranger (monitor) Report summarization “A Cluster is a large Error Amplifier”

User Application Issues Workload scheduling Startup effects Distribution vs Hot Spots System and Network Limits File descriptors Memory Cache contention NIS, DNS, AMD Job Scheduling Test Beds