Presentation is loading. Please wait.

Presentation is loading. Please wait.

HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.

Similar presentations


Presentation on theme: "HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab."— Presentation transcript:

1 HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab

2 2 nd Nov 2000Alan Silverman 2 HEPiX - Jlab Overview of the talk Why - the rationale why CERN proposes this SIG Who – who is or might be interested What – what could such a SIG do When – what is the timescale for setup and first actions

3 2 nd Nov 2000Alan Silverman 3 HEPiX - Jlab My Given Mandate “There is an emerging consensus that an important part of the analysis of LHC data will be performed in "Regional Computing Centres", closely integrated with each other and with the CERN facility to provide as far as possible a single computing environment.” “It is proposed that we start within HEPIX a special interest group on Large Scale Cluster Management to share ideas and experience between the labs involved in regional centre computing, with a view to minimising the number of overlapping developments and maximising the degree of standardisation of the environment.”

4 2 nd Nov 2000Alan Silverman 4 HEPiX - Jlab Parallel Developments Monitoring - PEM (CERN), NGOP (FNAL), GMS (IN2P3) Software certification in progress in 3-4 labs now or soon on Solaris 8 and Linux 7.x Software installation projects - CERN, DESY Remedy trouble ticket workflows - SLAC, CERN, FNAL Kerberos 5 - CERN (CLASP), FNAL, DESY, … GRIDS - European Datagrid, PPDG and GryPhN

5 2 nd Nov 2000Alan Silverman 5 HEPiX - Jlab

6 2 nd Nov 2000Alan Silverman 6 HEPiX - Jlab

7 2 nd Nov 2000Alan Silverman 7 HEPiX - Jlab

8 2 nd Nov 2000Alan Silverman 8 HEPiX - Jlab

9 2 nd Nov 2000Alan Silverman 9 HEPiX - Jlab European DataGRID WP4 – Fabric Management The objective of the fabric management work package (WP4) is to develop new automated system management techniques that will enable the deployment of very large computing fabrics constructed from mass market components with reduced systems administration and operations costs. The fabric must support an evolutionary model that allows the addition and replacement of components, and the introduction of new technologies, while maintaining service. The fabric management must be demonstrated in the project in production use on several thousand processors, and be able to scale to tens of thousands of processors.

10 2 nd Nov 2000Alan Silverman 10 HEPiX - Jlab Who might be concerned The various GRID projects – only the European DataGRID seems to mention the basic computing fabric as an issue. CERN LHC experiment Tier 1 sites LHC Tier 2 sites? FNAL? FNAL Run II remote sites (soon in production) BNL RHIC and remote sites (in production) SLAC BaBar and remote sites (in production) Basically – all the traditional HEPiX attendees

11 2 nd Nov 2000Alan Silverman 11 HEPiX - Jlab What could a SIG do? First, promote appropriate sessions at future HEPiX meetings; perhaps even special meetings Make sure each site knows what relevent work is in progress (produce some form of list of work in progress?) Be aware and promote collaboration, share parts of projects perhaps Be open to the possibility of people exchanges

12 2 nd Nov 2000Alan Silverman 12 HEPiX - Jlab Some possible concrete examples These came from my first discussions last week at FNAL (thanks to Lisa and Dane and many others) and the site reports Certification of future versions of Linux and Solaris Security (Kerberos 5), single-site sign-on, common authorisation files, password coordination (Jlab’s password utility) Kickstart for clusters? …..

13 2 nd Nov 2000Alan Silverman 13 HEPiX - Jlab More examples A workshop to write the definitive guide to building and running a cluster - how to choose/select/test the hardware; software installation and upgrade tools; performance mgmt, logging, accounting, alarms, security, etc, etc Add a note on what exists and what might scale to large clusters. Maintain this. For example ……. (from Chuck Boeheim)

14 2 nd Nov 2000Alan Silverman 14 HEPiX - Jlab Rack Density, Packaging Shopping for >= 2CPU/RU Per-unit costs for wiring, power become significant Cooling of areas becomes significant problem (machine room was designed for water-cooled mainframes)

15 2 nd Nov 2000Alan Silverman 15 HEPiX - Jlab Console Management Use console servers that gather 512 lines per server Provide SSL and SSH support for staff to connect from anywhere, anytime Automatic monitoring of all console traffic Power management from console

16 2 nd Nov 2000Alan Silverman 16 HEPiX - Jlab Installations Using Solaris Jumpstart, one person can install 100s of systems per day Trying to get to the same point with Linux PXE protocol is not up to the task, still need boot floppies

17 2 nd Nov 2000Alan Silverman 17 HEPiX - Jlab Monitoring Console monitoring Ranger Ping Switch port reports Mail summarizer

18 2 nd Nov 2000Alan Silverman 18 HEPiX - Jlab Cluster = Amplifier One mistake generated 4000 emails per hour Use mail summarizer to intercept Need to give it its own mail server!

19 2 nd Nov 2000Alan Silverman 19 HEPiX - Jlab When Since last week actually (information gathering visit to FNAL, an CMS Tier 1 Centre) Various discussions this week (and next week at BNL, an ATLAS Tier 1 Centre) A half or full day session at the next and all future HEPiX meetings on cluster subjects From now to then, information gathering. Please send me information about possibly-relevent work in progress


Download ppt "HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab."

Similar presentations


Ads by Google