Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.

Slides:



Advertisements
Similar presentations
Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Torsten Antoni – LCG Operations Workshop, CERN 02-04/11/04 Global Grid User Support - GGUS -
Advertisements

Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Helmut Dres, Institute For Scientific Computing – GDB Meeting Global Grid User Support.
An overview of the EGEE project Bob Jones EGEE Technical Director DTI International Technology Service-GlobalWatch Mission CERN – June 2004.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
LCSC October The EGEE project: building a grid infrastructure for Europe Bob Jones EGEE Technical Director 4 th Annual Workshop on Linux.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
LHC Computing Grid Project: Status and Prospects
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
UK DTI Mission – 29 June Grid Deployment Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to UK.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
Grid Applications for High Energy Physics and Interoperability Dominique Boutigny CC-IN2P3 June 24, 2006 Centre de Calcul de l’IN2P3 et du DAPNIA.
May 2004Sverre Jarp1 Preparing the computing solutions for the Large Hadron Collider (LHC) at CERN Sverre Jarp, openlab CTO IT Department, CERN.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
EGEE is proposed as a project funded by the European Union under contract IST EU eInfrastructure project initiatives FP6-EGEE Fabrizio Gagliardi.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
INFSO-RI Enabling Grids for E-sciencE Plan until the end of the project and beyond, sustainability plans Dieter Kranzlmüller Deputy.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
CERN LCG Deployment Overview Ian Bird CERN IT/GD LHCC Comprehensive Review November 2003.
EGEE is a project funded by the European Union under contract IST Middleware Planning for LCG/EGEE Bob Jones EGEE Technical Director e-Science.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America EELA Infrastructure (WP2) Roberto Barbera.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
…building the next IT revolution From Web to Grid…
Tony Doyle - University of Glasgow 8 July 2005Collaboration Board Meeting GridPP Report Tony Doyle.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE is a project funded by the European Union under contract IST Support in EGEE Ron Trompert SARA NEROC Meeting, 28 October
SC4 Planning Planning for the Initial LCG Service September 2005.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
LCG GDB LCG User Support 8 February 2005 – n o 1 LCG/EGEE User Support Flavia Donno LCG/INFN-Pisa
EGEE is a project funded by the European Commission under contract IST NA4/HEP work F Harris (Oxford/CERN) M.Lamanna(CERN) NA4 Open meeting.
LHC Computing, CERN, & Federated Identities
CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review November 2003.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
EGEE is a project funded by the European Union under contract IST EGEE Summary NA2 Partners April
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
IAG – Israel Academic Grid, EGEE and HEP in Israel Prof. David Horn Tel Aviv University.
EGEE Project Review Fabrizio Gagliardi EDG-7 30 September 2003 EGEE is proposed as a project funded by the European Union under contract IST
INFSO-RI Enabling Grids for E-sciencE User and Virtual Organisation Support in EGEE Flavia Donno, CERN Torsten Antoni, FZK Alistair.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
- 11apr03 # 1 Operations Operations Management = LCG deployment management Management team at CERN (+7 FTEs) Core infrastructure.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
EGEE is a project funded by the European Union under contract IST Service Activity 1 M.Cristina Vistoli ROC Coordinator All activity meeting,
LCG Workshop User Support Working Group 2-4 November 2004 – n o 1 Some thoughts on planning and organization of User Support in LCG/EGEE Flavia Donno LCG.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
INFN-Grid WS, Bari, 2004/10/15 Andrea Caltroni, INFN-Padova Marco Verlato, INFN-Padova Andrea Ferraro, INFN-CNAF Bologna EGEE User Support Report.
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
SA1 Execution Plan Status and Issues
LCG Security Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Long-term Grid Sustainability
EGEE support for HEP and other applications
The CCIN2P3 and its role in EGEE/LCG
LCG Operations Workshop, e-IRG Workshop
Operating the LCG and EGEE Production Grid for HEP
LHC Data Analysis using a worldwide computing grid
Collaboration Board Meeting
Presentation transcript:

Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations

22 October Grid Operations: Scope of Responsibilities Certification activities  Certification of middleware as a coherent set of services  Preparing that package for deploying Operational and support activities  Coordinating and supporting the deployment to collaborating computer centres  Coordinating Grid Operations activities  Providing Operational support  Providing Operational security support  Providing User support  CA management  VO registration and management Policy  CA and user registration policies  Operational policy  Security policies  Resource usage and access policies

RAL IN2P3 FNAL Tier-1 USC …. Krakow CIEMAT Rome Taipei LIP CSCS Legnaro UB IFCA IC MSU Prague Budapest Cambridge IFIC NIKHEF TRIUMF CNAF FZK BNL PIC ICEPP Nordic …. Tier-2 small centres desktops portables Tier-2 – –Well-managed, grid- enabled disk storage –End-user analysis – batch and interactive –Simulation LHC Computing Model (simplified!!) Tier-0 – the accelerator centre –Filter  raw data  reconstruction  event summary data (ESD) –Record the master copy of raw and ESD Tier-1 – –Managed Mass Storage – permanent storage raw, ESD, calibration data, meta-data, analysis data and databases  grid-enabled data service –Data-heavy (ESD-based) analysis –Re-processing of raw data –National, regional support –“online” to the data acquisition process high availability, long-term commitment

last update 30/11/ :47 LCG LCG-2 25 Universities 4 National Labs 2800 CPUs Grid3 30 sites 3200 cpus Total: 78 Sites ~9000 CPUs 6.5 PByte Total: 78 Sites ~9000 CPUs 6.5 PByte

22 October Operations services for LCG Operational support  Hierarchical model CERN acts as 1 st level support for the Tier 1 centres Tier 1 centres provide 1 st level support for associated Tier 2s –Tier 1  “Primary sites”  Grid Operations Centres (GOC) Provide operational monitoring, troubleshooting, coordination of incident response, etc. RAL (UK) led sub-project to prototype a GOC 2 nd GOC in Taipei now in prototype User support  Central model FZK provides user support portal –Problem tracking system web-based and available to all LCG participants Experiments provide triage of problems  CERN team provide in-depth support and support for integration of experiment sw with grid middleware

22 October Support Teams within LCG CERN Deployment Support (CDS) Middleware Problems 4 LHC experiments (Alice Atlas CMS LHCb) Other Communities (VOs) 4 non-LHC experiments (BaBar CDF Compass D0) Grid Operations Center (GOC) Operations Problems Resource Centers (RC) Hardware Problems Experiment Specific User Support (ESUS) Software Problems Global Grid User Support (GGUS) Single Point of Contact Coordination of User Support

22 October Experiences in deployment LCG covers many sites (>70) now – both large and small  Large sites – existing infrastructures – need to add-on grid interfaces etc.  Small sites want a completely packaged, push-button, out-of-the-box installation (including batch system, etc)  Satisfying both simultaneously is hard – requires very flexible packaging, installation, and configuration tools and procedures A lot of effort had to be invested in this area There are many problems – but in the end we are quite successful  System is stable and reliable  System is used in production  System is reasonably easy to install now – 60 sites  Now have a basis on which to incrementally build essential functionality This infrastructure forms the basis of the initial EGEE production service

22 October LCG Operations  EGEE Operations

22 October What is EGEE ? (I) EGEE (Enabling Grids for Escience in Europe) is a seamless Grid infrastructure for the support of scientific research, which:  Integrates current national, regional and thematic Grid efforts  Provides researchers in academia and industry with round-the-clock access to major computing resources, independent of geographic location Applications Geant network Grid infrastructure

22 October What is EGEE ? (II) 70 leading institutions in 28 countries, federated in regional Grids 32 M Euros EU funding (2004-5), O(100 M) total budget Aiming for a combined capacity of over 8000 CPUs (the largest international Grid infrastructure ever assembled) ~ 300 persons

22 October EGEE Activities Emphasis on operating a production grid and supporting the end-users 48 % service activities (Grid Operations, Support and Management, Network Resource Provision) 24 % middleware re-engineering (Quality Assurance, Security, Network Services Development) 28 % networking (Management, Dissemination and Outreach, User Training and Education, Application Identification and Support, Policy and International Cooperation)

22 October LCG and EGEE Operations EGEE is funded to operate and support a research grid infrastructure in Europe The core infrastructure of the LCG and EGEE grids is now operated as a single service, growing out of LCG service  LCG includes US and Asia-Pacific, EGEE includes other sciences  Substantial part of infrastructure common to both LCG Deployment Manager is the EGEE Operations Manager  CERN team (Operations Management Centre) provides coordination, management, and 2 nd level support Support activities are expanded with the provision of  Core Infrastructure Centres (CIC) (4)  Regional Operations Centres (ROC) (9)  ROCs are coordinated by Italy, outside of CERN (which has no ROC)

22 October User support:  Becomes hierarchical  Through the Regional Operations Centres (ROC) Act as front-line support for user and operations issues Provide local knowledge and adaptations Coordination:  At CERN (Operations Management Centre) and CIC for HEP Operational support:  The LCG GOC is the model for the EGEE CICs CIC’s replace the European GOC at RAL Also run essential infrastructure services Provide support for other (non-LHC) applications Provide 2 nd level support to ROCs LCG  EGEE in Europe

22 October Summary Data challenges – demonstrated:  Many m/w functional and performance issues (documented)  Main problem is service stability Site fabric management, configuration, change control Etc  Grid3 report similar problems …  User support process needs improvement Now moving into continuous production + service & data challenges

22 October How to move forward – 1 Build an agreed operations model for the next year  Should be able to evolve Operations/Fabric workshop Nov 2 – 4  Hepix ½ day – input from some sites and Grid3/OSG on their plans  Documenting use-cases (based on experience), propose support mechanisms for each  EGEE SA1 infrastructure  5 working groups: Operations support User support Operational security Fabric management issues SW needs and tools  requirements from operations Need fabric management training for many sites

22 October Some issues Resource Centres:  Large sites – have operations staff and/or on-call support  Small sites – have no on-call and often little support at all Regional Operations Centres:  Probably do not provide after-hours or on-call support. If this were the case then the model of support could more include the ROCs. However, it is clear that most ROCs will not have this level of support. Core Infrastructure Centres:  Must have on-call support after-hours To be rotated through the 4 or 5 active CICs Thus, a basic question to answer is how much power or control can the CICs have in order to deal with problems when staff at RCs and ROCs are not available?  Either CICs have rights to manage critical services on sites where there is no support, or  Have the right to remove “broken” sites and services from the infrastructure. Likely that we have all combinations of these …

22 October Immediate actions Weekly operations meeting (Monday afternoon)  Weekly reports from ROCs, CICs, other Tier 1s etc Operations Manager –  Role rotates through 4 EGE CIC’s – manage problem reporting and follow up  Hand over responsibility in weekly meeting Operational security team  Being set up – led by Ian Neilson, strong collaboration between US and Europe on these issues.