HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab.

Slides:



Advertisements
Similar presentations
24-May-04D.P.Kelsey, Introduction, HEPiX, Edinburgh, Welcome and Introduction HEPiX, NeSC, Edinburgh 24 May 2004 David Kelsey CCLRC/RAL, UK
Advertisements

Chris Brew RAL PPD HEPiX Workshop Summery Brookhaven National Laboratory October Chris Brew CCLRC.
HEPiX Meeting Wrap Up Fall 2000 JLab. Meeting Highlights Monitoring –Several projects underway –Collaboration of ideas occurred –Communication earlier.
Password?. Project CLASP: Common Login and Access rights across Services Plan
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Password? CLASP Phase 2: Revised Proposal C5 Meeting, 16 February 2001 Denise Heagerty, IT/IS.
Lesson 11-Virtual Private Networks. Overview Define Virtual Private Networks (VPNs). Deploy User VPNs. Deploy Site VPNs. Understand standard VPN techniques.
© 2010 VMware Inc. All rights reserved VMware ESX and ESXi Module 3.
Module 8 Implementing Backup and Recovery. Module Overview Planning Backup and Recovery Backing Up Exchange Server 2010 Restoring Exchange Server 2010.
Improving Customer Satisfaction Through Advances in Remote Management Technology Greg Michel Product Manager Quintum Technologies Inc.
EU funding for DataGrid under contract IST is gratefully acknowledged GridPP Tier-1A Centre CCLRC provides the GRIDPP collaboration (funded.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
HEPiX Catania 19 th April 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 19 th April 2002 HEPiX 2002, Catania.
HEPiX Orsay 27 th April 2001 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 27 th April 2001 HEPiX 2001, Orsay.
Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
1 Linux in the Computer Center at CERN Zeuthen Thorsten Kleinwort CERN-IT.
Yannick Patois – CVS and Autobuild tools at CCIN2P3 – hepix - October, n° 1 CVS setup at CC-IN2P3 and Datagrid edg- build tools CVS management,
The SLAC Cluster Chuck Boeheim Assistant Director, SLAC Computing Services.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
Nov 1, 2000Site report DESY1 DESY Site Report Wolfgang Friebel DESY Nov 1, 2000 HEPiX Fall
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks perfSONAR deployment over Spanish LHC Tier.
Tier 1 Facility Status and Current Activities Rich Baker Brookhaven National Laboratory NSF/DOE Review of ATLAS Computing June 20, 2002.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Contents Introduction Problem Definition Proposed Solution
Fermilab Distributed Monitoring System (NGOP) Progress Report J.Fromm K.Genser T.Levshina M.Mengel V.Podstavkov.
Developing & Managing A Large Linux Farm – The Brookhaven Experience CHEP2004 – Interlaken September 27, 2004 Tomasz Wlodek - BNL.
SLAC Site Report Chuck Boeheim Assistant Director, SLAC Computing Services.
The ILC And the Grid Andreas Gellrich DESY LCWS2007 DESY, Hamburg, Germany
F. Rademakers - CERN/EPLinux Certification - FOCUS Linux Certification Fons Rademakers.
1 The new Fabric Management Tools in Production at CERN Thorsten Kleinwort for CERN IT/FIO HEPiX Autumn 2003 Triumf Vancouver Monday, October 20, 2003.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
D0RACE: Testbed Session Lee Lueking D0 Remote Analysis Workshop February 12, 2002.
April 25, 2001HEPiX/HEPNT FERMI SITE REPORT Lisa Giacchetti.
3-Nov-00D.P.Kelsey, HEPiX, JLAB1 Certificates for DataGRID David Kelsey CLRC/RAL, UK
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer, Progress Sonic.
4/5/20071 The LAW (Linux Applications on Windows) Project Sudhamsh Reddy University of Texas at Arlington.
Large Farm 'Real Life Problems' and their Solutions Thorsten Kleinwort CERN IT/FIO HEPiX II/2004 BNL.
20-May-2003HEPiX Amsterdam EDG Fabric Management on Solaris G. Cancio Melia, L. Cons, Ph. Defert, I. Reguero, J. Pelegrin, P. Poznanski, C. Ungil Presented.
GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.
Lemon Monitoring Miroslav Siket, German Cancio, David Front, Maciej Stepniewski CERN-IT/FIO-FS LCG Operations Workshop Bologna, May 2005.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
…building the next IT revolution From Web to Grid…
General rules 1. Rule: 2. Rule: 3. Rule: 10. Rule: Ask questions ……………………. 11. Rule: I do not know your skill. If I tell you things you know, please stop.
HEPiX FNAL ‘02 25 th Oct 2002 Alan Silverman HEPiX Large Cluster SIG Report Alan Silverman 25 th October 2002 HEPiX 2002, FNAL.
SONIC-3: Creating Large Scale Installations & Deployments Andrew S. Neumann Principal Engineer Progress Sonic.
Cluster Configuration Update Including LSF Status Thorsten Kleinwort for CERN IT/PDP-IS HEPiX I/2001 LAL Orsay Tuesday, December 08, 2015.
Large Cluster Workshop 7 th September 2001 Alan Silverman Large-Scale Cluster Computing Workshop held at Fermilab th May 2001 Alan Silverman and.
Site Manageability & Monitoring Issues for LCG Ian Bird IT Department, CERN LCG MB 24 th October 2006.
CD FY09 Tactical Plan Status FY09 Tactical Plan Status Report for Neutrino Program (MINOS, MINERvA, General) Margaret Votava April 21, 2009 Tactical plan.
High Availability Technologies for Tier2 Services June 16 th 2006 Tim Bell CERN IT/FIO/TSI.
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
W.A.Wojcik/CCIN2P3, Nov 1, CCIN2P3 Site report Wojciech A. Wojcik IN2P3 Computing Center URL:
15-Feb-02Steve Traylen, RAL WP6 Test Bed Report1 RAL/UK WP6 Test Bed Report Steve Traylen, WP6 PPGRID/RAL, UK
BladeLogic Demo. 03/10/09 BladeLogic Demo BladeLogic Who? Automation taking my job? What? No, it’s making it easier. Started by entrepreneurs who understood.
A Service-Based SLA Model HEPIX -- CERN May 6, 2008 Tony Chan -- BNL.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
There are 5 pull-down menus. Provide your affiliation : select E-1000 in the 1 st pull-down which asks for your experiment – it is there. Provide your.
There are 5 pull-down menus. Provide your affiliation : select E-1000 in the 1 st pull-down which asks for your experiment – it is there. Provide your.
Cluster Workshop 22 May 2001 Alan Silverman Goals of the Large- Scale Cluster Computing Workshop Alan Silverman 22 nd May 2001 Fermilab.
Dave Newbold, University of Bristol14/8/2001 Testbed 1 What is it? First deployment of DataGrid middleware tools The place where we find out if it all.
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
Few Highlights from HEPIX/HEPNT Alberto Pace. Warning  This is not a comprehensive report.  See Alan Silverman’s excellent summary if you need this.
Status of Task Forces Ian Bird GDB 8 May 2003.
Monitoring and Fault Tolerance
UK GridPP Tier-1/A Centre at CLRC
Introduction to HEPiX Helge Meinhard, CERN-IT
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
Presentation transcript:

HEPiX 2 nd Nov 2000 Alan Silverman Proposal to form a Large Cluster SIG Alan Silverman 2 nd Nov 2000 HEPiX – Jefferson Lab

2 nd Nov 2000Alan Silverman 2 HEPiX - Jlab Overview of the talk Why - the rationale why CERN proposes this SIG Who – who is or might be interested What – what could such a SIG do When – what is the timescale for setup and first actions

2 nd Nov 2000Alan Silverman 3 HEPiX - Jlab My Given Mandate “There is an emerging consensus that an important part of the analysis of LHC data will be performed in "Regional Computing Centres", closely integrated with each other and with the CERN facility to provide as far as possible a single computing environment.” “It is proposed that we start within HEPIX a special interest group on Large Scale Cluster Management to share ideas and experience between the labs involved in regional centre computing, with a view to minimising the number of overlapping developments and maximising the degree of standardisation of the environment.”

2 nd Nov 2000Alan Silverman 4 HEPiX - Jlab Parallel Developments Monitoring - PEM (CERN), NGOP (FNAL), GMS (IN2P3) Software certification in progress in 3-4 labs now or soon on Solaris 8 and Linux 7.x Software installation projects - CERN, DESY Remedy trouble ticket workflows - SLAC, CERN, FNAL Kerberos 5 - CERN (CLASP), FNAL, DESY, … GRIDS - European Datagrid, PPDG and GryPhN

2 nd Nov 2000Alan Silverman 5 HEPiX - Jlab

2 nd Nov 2000Alan Silverman 6 HEPiX - Jlab

2 nd Nov 2000Alan Silverman 7 HEPiX - Jlab

2 nd Nov 2000Alan Silverman 8 HEPiX - Jlab

2 nd Nov 2000Alan Silverman 9 HEPiX - Jlab European DataGRID WP4 – Fabric Management The objective of the fabric management work package (WP4) is to develop new automated system management techniques that will enable the deployment of very large computing fabrics constructed from mass market components with reduced systems administration and operations costs. The fabric must support an evolutionary model that allows the addition and replacement of components, and the introduction of new technologies, while maintaining service. The fabric management must be demonstrated in the project in production use on several thousand processors, and be able to scale to tens of thousands of processors.

2 nd Nov 2000Alan Silverman 10 HEPiX - Jlab Who might be concerned The various GRID projects – only the European DataGRID seems to mention the basic computing fabric as an issue. CERN LHC experiment Tier 1 sites LHC Tier 2 sites? FNAL? FNAL Run II remote sites (soon in production) BNL RHIC and remote sites (in production) SLAC BaBar and remote sites (in production) Basically – all the traditional HEPiX attendees

2 nd Nov 2000Alan Silverman 11 HEPiX - Jlab What could a SIG do? First, promote appropriate sessions at future HEPiX meetings; perhaps even special meetings Make sure each site knows what relevent work is in progress (produce some form of list of work in progress?) Be aware and promote collaboration, share parts of projects perhaps Be open to the possibility of people exchanges

2 nd Nov 2000Alan Silverman 12 HEPiX - Jlab Some possible concrete examples These came from my first discussions last week at FNAL (thanks to Lisa and Dane and many others) and the site reports Certification of future versions of Linux and Solaris Security (Kerberos 5), single-site sign-on, common authorisation files, password coordination (Jlab’s password utility) Kickstart for clusters? …..

2 nd Nov 2000Alan Silverman 13 HEPiX - Jlab More examples A workshop to write the definitive guide to building and running a cluster - how to choose/select/test the hardware; software installation and upgrade tools; performance mgmt, logging, accounting, alarms, security, etc, etc Add a note on what exists and what might scale to large clusters. Maintain this. For example ……. (from Chuck Boeheim)

2 nd Nov 2000Alan Silverman 14 HEPiX - Jlab Rack Density, Packaging Shopping for >= 2CPU/RU Per-unit costs for wiring, power become significant Cooling of areas becomes significant problem (machine room was designed for water-cooled mainframes)

2 nd Nov 2000Alan Silverman 15 HEPiX - Jlab Console Management Use console servers that gather 512 lines per server Provide SSL and SSH support for staff to connect from anywhere, anytime Automatic monitoring of all console traffic Power management from console

2 nd Nov 2000Alan Silverman 16 HEPiX - Jlab Installations Using Solaris Jumpstart, one person can install 100s of systems per day Trying to get to the same point with Linux PXE protocol is not up to the task, still need boot floppies

2 nd Nov 2000Alan Silverman 17 HEPiX - Jlab Monitoring Console monitoring Ranger Ping Switch port reports Mail summarizer

2 nd Nov 2000Alan Silverman 18 HEPiX - Jlab Cluster = Amplifier One mistake generated s per hour Use mail summarizer to intercept Need to give it its own mail server!

2 nd Nov 2000Alan Silverman 19 HEPiX - Jlab When Since last week actually (information gathering visit to FNAL, an CMS Tier 1 Centre) Various discussions this week (and next week at BNL, an ATLAS Tier 1 Centre) A half or full day session at the next and all future HEPiX meetings on cluster subjects From now to then, information gathering. Please send me information about possibly-relevent work in progress