Production Manager’s Report PMB Jeremy Coles 13 rd September 2004.

Slides:



Advertisements
Similar presentations
S.L.LloydATSE e-Science Visit April 2004Slide 1 GridPP – A UK Computing Grid for Particle Physics GridPP 19 UK Universities, CCLRC (RAL & Daresbury) and.
Advertisements

Deployment metrics and planning (aka Potentially the most boring talk this week) GridPP16 Jeremy Coles 27 th June 2006.
LCG WLCG Operations John Gordon, CCLRC GridPP18 Glasgow 21 March 2007.
Status Report University of Bristol 3 rd GridPP Collaboration Meeting 14/15 February, 2002Marc Kelly University of Bristol 1 Marc Kelly University of Bristol.
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
NorthGrid status Alessandra Forti Gridpp12 Brunel, 1 February 2005.
12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
London Tier 2 Status Report GridPP 12, Brunel, 1 st February 2005 Owen Maroney.
Southgrid Status Report Pete Gronbech: February 2005 GridPP 12 - Brunel.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
ScotGrid: a Prototype Tier-2 Centre – Steve Thorn, Edinburgh University SCOTGRID: A PROTOTYPE TIER-2 CENTRE Steve Thorn Authors: A. Earl, P. Clark, S.
EGEE is a project funded by the European Union under contract IST SA1 and NA3 Alistair Mills Grid Deployment Group +41.
Deployment Summary GridPP12 Jeremy Coles 1 st February 2005.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
Security Middleware and VOMS service status Andrew McNab Grid Security Research Fellow University of Manchester.
D. Britton GridPP Status - ProjectMap 22/Feb/06. D. Britton22/Feb/2006GridPP Status GridPP2 ProjectMap.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Southgrid Technical Meeting Pete Gronbech: 16 th March 2006 Birmingham.
Tony Doyle GridPP – From Prototype To Production, GridPP10 Meeting, CERN, 2 June 2004.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
DataGrid Applications Federico Carminati WP6 WorkShop December 11, 2000.
Quarterly report ScotGrid Quarter Fraser Speirs.
1 st EGEE Conference – April UK and Ireland Partner Dave Kant Deputy ROC Manager.
Security Area in GridPP2 4 Mar 2004 Security Area in GridPP2 “Proforma-2 posts” overview Deliverables – Local Access – Local Usage.
3 June 2004GridPP10Slide 1 GridPP Dissemination Sarah Pearce Dissemination Officer
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Neil Geddes GridPP-10, June 2004 UK e-Science Grid Dr Neil Geddes CCLRC Head of e-Science Director of the UK Grid Operations Support Centre.
GridPP3 Project Management GridPP20 Sarah Pearce 11 March 2008.
John Gordon CCLRC e-Science Centre LCG Deployment in the UK John Gordon GridPP10.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
GridPP Presentation to AstroGrid 13 December 2001 Steve Lloyd Queen Mary University of London.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
GridPP Deployment Status GridPP15 Jeremy Coles 11 th January 2006.
London Tier 2 Status Report GridPP 11, Liverpool, 15 September 2004 Ben Waugh on behalf of Owen Maroney.
EGEE is a project funded by the European Union under contract INFSO-RI Status of the Pre-Production Service ROC Managers’ Meeting 5 October 2004.
The Experiments – progress and status Roger Barlow GridPP7 Oxford 2 nd July 2003.
Dave Newbold, University of Bristol8/3/2001 UK Testbed 0 Sites Sites that have committed to TB0: RAL (R) Birmingham (Q) Bristol (Q) Edinburgh (Q) Imperial.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
Deployment Summary GridPP11 Jeremy Coles 15th September 2004.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Presenter Name Facility Name UK Testbed Status and EDG Testbed Two. Steve Traylen GridPP 7, Oxford.
2-Sep-02Steve Traylen, RAL WP6 Test Bed Report1 RAL and UK WP6 Test Bed Report Steve Traylen, WP6
Andrew McNab - Manchester HEP - 17 September 2002 UK Testbed Deployment Aim of this talk is to the answer the questions: –“How much of the Testbed has.
Rutherford Appleton Lab, UK VOBox Considerations from GridPP. GridPP DTeam Meeting. Wed Sep 13 th 2005.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
EGEE is a project funded by the European Union under contract IST “LCG2 Operational Experience and Status” Markus Schulz, IT-GD, CERN
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
CERN IT Department CH-1211 Genève 23 Switzerland t SL(C) 5 Migration at CERN CHEP 2009, Prague Ulrich SCHWICKERATH Ricardo SILVA CERN, IT-FIO-FS.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
J Jensen/J Gordon RAL Storage Storage at RAL Service Challenge Meeting 27 Jan 2005.
VOMS chapter 1&1/2 Alessandra Forti Sergey Dolgodobrov HEP Sysman meeting 5 December 2005.
GridPP2 Data Management work area J Jensen / RAL GridPP2 Data Management Work Area – Part 2 Mass storage & local storage mgmt J Jensen
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE is a project funded by the European Union
David Kelsey CCLRC/RAL, UK
Ian Bird GDB Meeting CERN 9 September 2003
INFNGRID Workshop – Bari, Italy, October 2004
Presentation transcript:

Production Manager’s Report PMB Jeremy Coles 13 rd September 2004

Site status information (1)

Site status information (2) Up-to-date status at:

Headline news Tier-1 CPU upgrades in place for some time now Disk servers not yet in production – awaiting riser card revision for all machines as current series continue to break (25% failure rate) Tier-2s RAL-PPD is now supporting the Babar VO QMUL unable to upgrade until new port of LCG2 is working Liverpool is ramping up CPUs Edinburgh and Brunel recently completed tests General Babar expect to upgrade farms to SLC3 this autumn GridPP is now offering about 1900 job slots (almost = CPUs)

Deployment areas (1) Security – 2 incidents at CERN Middleware – C&T will issue a release each month but this will not necessarily move to a deployment release (see slide for pre-production service) Fabric –Front-end nodes for UK sites due to be delivered 14 th September. Many sites waiting. –LCFG will not be supported under SLC3 and CERN will not formally release Quattor updates. Discussion in parallel discussion this week. Documentation –Identified need for site administrators guide book –Intend to start using GOC news update service for site information Support –No change yet. EGEE operational plan due to be discussed at CERN workshop 1-3 November and also at EGEE CIC meeting today

EGEE Pre-production service Roadmap for migration of pre-production service to gLite: –Based on the following assumptions: JRA1 release plan v1.3 JRA1 testing takes 4 weeks SA1 certification takes 2 weeks Pre-production takes s/w only after certification is complete No problems found in either JRA1 testing or SA1 certification –ComponentAvailable from JRA1In pre-production (earliest)  R-GMAmid Septembermid October  CEend Septemberend October  Metadata Cat.end Septemberend October  File I/Oend Septemberend October  Accountingend Octoberend November  Data Schedulerend Octoberend November  File Cat.end Octoberend November  File Transferend Octoberend November  Logging & Book.end Octoberend November  Replica Cat.end Octoberend November  SEend Octoberend November  VOMSend Octoberend November  WMSend Octoberend November  SRMmid January ’05early February ’05

Deployment areas (2) Procedures – Also part of EGEE work. More work needs to be done in this area Accounting & monitoring –Deliverable for December will probably cover PBS and LSF Metrics –Some progress but clarity still required (see later slide) Deployment plan –Some progress in identifying dependencies but more work to be done –Proving difficult to get dates from various areas! –..\Planning\GridPP CP v0.1.ppt..\Planning\GridPP CP v0.1.ppt

Operational issues (selection) Slow response from sites –Upgrades, response to problems, etc –Problems reported daily – some problems last for weeks Lack of staff available to fix problems –All on vacation, … Misconfigurations (see next slide) Lack of configuration management – problems that are fixed reappear Lack of fabric management –Is it GDA responsibility to provide solutions to these problems? If so, we need more available effort (see slide on workshops etc) Lack of understanding (training?) –Admins reformat disks of SE … Firewall issues – –often no good coordination between grid admins and firewall maintainers PBS problems –Are we seeing the scaling limits of PBS? Forget to read documentation … Ian Bird talk GDB 8 th September

Site (mis) - configurations Site mis-configuration was responsible for most of the problems that occurred during the experiments Data Challenges. Here is a non-complete list of problems: –– The variable VO SW DIR points to a non existent area on WNs. –– The ESM is not allowed to write in the area dedicated to the software installation –– Only one certificate allowed to be mapped to the ESM local account –– Wrong information published in the information system (Glue Object Classes not linked) –– Queue time limits published in minutes instead of seconds and not normalized –– /etc/ld.so.conf not properly configured. Shared libraries not found. –– Machines not synchronized in time –– Grid-mapfiles not properly built –– Pool accounts not created but the rest of the tools configured with pool accounts –– Firewall issues –– CA files not properly installed –– NFS problems for home directories or ESM areas –– Services configured to use the wrong BDII –– Wrong user profiles –– Default user shell environment too big Ian Bird talk GDB 8 th September

ATLAS on LCG Preliminary UK contribution ~ 20%. Third biggest contribution from RAL.

Phase 1 Statistics from LHCb Split DC’04 in 3 Phases: 1) Production: MC simulation (Done). 2) Stripping: Event pre-selection (To start soon). 3) Analysis (In preparation). 424 CPU · Years

Other VOs Alice - active Babar - active CMS - active D0 - active ZEUS VO - active It may interest you to know that we have successfully processed Monte Carlo events on your cluster at RAL so far, and that virtually all of our tests there have been successful. (James Ferrando – DESY – 9 th September 2004) No clear picture of what is happening on SAMGrid

What is the status of…? ANTARES (high-energy neutrino telescope in Med) case for GridPP2 Application Interface Staff CALICE – namely London (Imperial and UCL), NorthGrid (Manchester) and SouthGrid (Cambridge, Birmigham and RAL) The seamless use of SAMGrid in cooperation with PPDG based on EGEE and LCG D0 – integration by 2007 MICE at RAL PhenoGrid UK Dark Matter Collaboration UKQCD

Deployment questions Do we need a UK middleware integration testbed? What level of network monitoring is required (which site(s))? How does resource allocation work across Tier-2s? What validation is required of MoU obligations? Will we formally define the GridPP security policy? What does available mean? => Tier-2 or Deployment board UK Tier-1 supporting non-UK Tier-2s

Metrics (1)

Metrics (2)