CERN Deploying the LHC Computing Grid The LCG Project Ian Bird IT Division, CERN CHEP 2003 27 March 2003.

Slides:



Advertisements
Similar presentations
CERN STAR TAP June 2001 Status of the EU DataGrid Project Fabrizio Gagliardi CERN EU-DataGrid Project Leader June 2001
Advertisements

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Last update: 01/06/ :09 LCG les robertson - cern-it 1 LHC Computing Grid Project - LCG The LHC Computing Grid First steps towards a Global Computing.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Last update: 02/06/ :05 LCG les robertson - cern-it 1 The LHC Computing Grid Project Preparing for LHC Data Analysis NorduGrid Workshop Stockholm,
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
1 Developing Countries Access to Scientific Knowledge Ian Willers CERN, Switzerland.
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
RomeWorkshop on eInfrastructures 9 December LCG Progress on Policies & Coming Challenges Ian Bird IT Division, CERN LCG and EGEE Rome 9 December.
The Preparatory Phase Proposal a first draft to be discussed.
Andrew McNab - Manchester HEP - 5 July 2001 WP6/Testbed Status Status by partner –CNRS, Czech R., INFN, NIKHEF, NorduGrid, LIP, Russia, UK Security Integration.
20-May-03D.P.Kelsey, LCG-1 Security, HEPiX1 Grid Security for LCG-1 HEPiX, NIKHEF, 20 May 2003 David Kelsey CCLRC/RAL, UK
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
11 December 2000 Paolo Capiluppi - DataGrid Testbed Workshop CMS Applications Requirements DataGrid Testbed Workshop Milano, 11 December 2000 Paolo Capiluppi,
DOSAR Workshop, Sao Paulo, Brazil, September 16-17, 2005 LCG Tier 2 and DOSAR Pat Skubic OU.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
LHC Computing Review Recommendations John Harvey CERN/EP March 28 th, th LHCb Software Week.
Responsibilities of ROC and CIC in EGEE infrastructure A.Kryukov, SINP MSU, CIC Manager Yu.Lazin, IHEP, ROC Manager
LCG LHC Computing Grid Project – LCG CERN – European Organisation for Nuclear Research Geneva, Switzerland LCG LHCC Comprehensive.
Bob Jones Technical Director CERN - August 2003 EGEE is proposed as a project to be funded by the European Union under contract IST
CERN LCG Deployment Overview Ian Bird CERN IT/GD LHCC Comprehensive Review November 2003.
EGEE is a project funded by the European Union under contract IST Middleware Planning for LCG/EGEE Bob Jones EGEE Technical Director e-Science.
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
Ian Bird GDB; CERN, 8 th May 2013 March 6, 2013
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
LCG Denis Linglin - 1 MàJ : 9/02/03 07:24 LHC Computing Grid Project Status Report 12 February 2003.
EGEE MiddlewareLCG Internal review18 November EGEE Middleware Activities Overview Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as.
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
LCG LCG Workshop – March 23-24, Middleware Development within the EGEE Project LCG Workshop CERN March 2004 Frédéric Hemmer.
SC4 Planning Planning for the Initial LCG Service September 2005.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
EGI-InSPIRE Steven Newhouse Interim EGI.eu Director EGI-InSPIRE Project Director Technical Director EGEE-III 1GDB - December 2009.
Ian Bird GDB CERN, 9 th September Sept 2015
LCG CERN David Foster LCG WP4 Meeting 20 th June 2002 LCG Project Status WP4 Meeting Presentation David Foster IT/LCG 20 June 2002.
6 march Building the INFN Grid Proposal outline a.ghiselli,l.luminari,m.sgaravatto,c.vistoli INFN Grid meeting, milano.
LHC Computing, CERN, & Federated Identities
CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review November 2003.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
EGEE is a project funded by the European Union under contract IST Roles & Responsibilities Ian Bird SA1 Manager Cork Meeting, April 2004.
EGEE Project Review Fabrizio Gagliardi EDG-7 30 September 2003 EGEE is proposed as a project funded by the European Union under contract IST
LHC Computing, SPC-FC-CC-C; H F Hoffmann1 CERN/2379/Rev: Proposal for building the LHC computing environment at CERN (Phase 1) Goals of Phase.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
12 March, 2002 LCG Applications Area - Introduction slide 1 LCG Applications Session LCG Launch Workshop March 12, 2002 John Harvey, CERN LHCb Computing.
A Computing Tier 2 Node Eric Fede – LAPP/IN2P3. 2 Eric Fede – 1st Chinese-French Workshop Plan What is a Tier 2 –Context and definition To be a Tier 2.
Grid Deployment Technical Working Groups: Middleware selection AAA,security Resource scheduling Operations User Support GDB Grid Deployment Resource planning,
10-Feb-00 CERN HepCCC Grid Initiative ATLAS meeting – 16 February 2000 Les Robertson CERN/IT.
Bob Jones EGEE Technical Director
Status of Task Forces Ian Bird GDB 8 May 2003.
Regional Operations Centres Core infrastructure Centres
EGEE Middleware Activities Overview
SA1 Execution Plan Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Grid Deployment Area Status Report
Long-term Grid Sustainability
Russian Regional Center for LHC Data Analysis
LCG Operations Centres
LHC Computing Grid Project - LCG
LCG experience in Integrating Grid Toolkits
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
LHC Data Analysis using a worldwide computing grid
LHC Computing, RRB; H F Hoffmann
Presentation transcript:

CERN Deploying the LHC Computing Grid The LCG Project Ian Bird IT Division, CERN CHEP March 2003

CERN The Large Hadron Collider Project 4 detectors CMS ATLAS LHCb Storage – Raw recording rate 0.1 – 1 GBytes/sec Accumulating at 5-8 PetaBytes/year 10 PetaBytes of disk Processing – 200,000 of today’s fastest PCs

CERN  CERN will provide the data reconstruction & recording service (Tier 0) -- but only a small part of the analysis capacity  current planning for capacity at CERN + principal Regional Centres  2002: 650 KSI2000  <1% of capacity required in 2008  2005: 6,600 KSI2000  < 10% of 2008 capacity

CERN 4 Multi-Tiered View of LHC Computing Gbs 1-10Gbs

CMS ATLAS LHCb CERN Tier 0 Centre at CERN physics group LHC Computing Model regional group Tier2 Lab a Uni a Lab c Uni n Lab m Lab b Uni b Uni y Uni x Tier3 physics department    Desktop Germany Tier 1 USA UK France Italy ………. CERN Tier 1 ………. The LHC Computing Centre CERN Tier 0

CERN The LCG Project

CERN The LHC Computing Grid Project Goals : Prepare and deploy the computing environment for the LHC experiments  Common applications, tools, frameworks and environments,  Move from testbed systems to real production services:  Operated and Supported 24x7 globally  Computing fabrics run as production physics services  Computing environment must be robust, stable, predictable, and supportable  Foster collaboration, coherence of the LHC computing centres  LCG is not a middleware development or grid technology project: It is a grid deployment project

CERN The LHC Computing Grid Project Phase 1 –  Development and prototyping  Approved by CERN Council 20 September 2001 Phase 2 –  Installation and operation of the full world-wide initial production Grid  Costs (materials + staff) included in the LHC cost to completion estimates Two phases

CERN The LHC Computing Grid Project Phase 1 Goals –  Prepare the LHC computing environment  provide the common tools and infrastructure for the physics application software  establish the technology for fabric, network and grid management (buy, borrow, or build)  develop models for building the Phase 2 Grid  validate the technology and models by building progressively more complex Grid prototypes  operate a series of data challenges for the experiments  maintain reasonable opportunities for the re-use of the results of the project in other fields  Deploy a 50% model * production GRID including the committed LHC Regional Centres  Produce a Technical Design Report for the full LHC Computing Grid to be built in Phase 2 of the project * 50% of the complexity of one of the LHC experiments

CERN LCG Deployment Plan Level 1 Milestones M1.1 - July 03First Global Grid Service (LCG-1) available M1.2 - June 03Hybrid Event Store (Persistency Framework) available for general users M1.3a - November 03LCG-1 reliability and performance targets achieved M1.3b - November 03Distributed batch production using grid services M1.4 - May 04Distributed end-user interactive analysis from “Tier 3” centre M1.5 - December 04“50% prototype” (LCG-3) available M1.6 - March 05Full Persistency Framework M1.7 - June 05LHC Global Grid TDR

CERN Schedule – Aggressive?  To be ready for data taking in Spring 2007  Need 1 year to procure, build and test the full LHC computing fabrics  The Computing TDR must be written in mid-2005  Need at least 1 year of experience in operating a production grid to validate the computing model  Thus LCG must be running the experiments’ data challenges in 2004  With a reasonable level of “production” service

CERN Centres taking part in the LCG prototype service ( ) around the world  around the clock

CERN Centres taking part in the LCG prototype service – Tier 0  CERN Tier 1 Centres  Brookhaven National Lab  CNAF Bologna  Fermilab  FZK Karlsruhe  IN2P3 Lyon  Rutherford Appleton Lab (UK)  University of Tokyo  CERN Other Centres  Academica Sinica (Taipei)  Barcelona  Caltech  GSI Darmstadt  Italian Tier 2s(Torino, Milano, Legnaro)  Manno (Switzerland)  Moscow State University  NIKHEF Amsterdam  Ohio Supercomputing Centre  Sweden (NorduGrid)  Tata Institute (India)  Triumf (Canada)  UCSD  UK Tier 2s  University of Florida– Gainesville  University of Prague  …… Confirmed Resources:

CERN LCG Resource Commitments – 1Q04 CPU (kSI2K) Disk TB Support FTE Tape TB CERN Czech Republic France Germany Holland Italy Japan Poland Russia Taiwan Spain Sweden Switzerland UK USA Total

CERN LCG Project Implementation Four work areas –  Applications  Grid Technology  Fabrics  Grid deployment

CERN Applications Area  Base support for the development process, infrastructure, tools, libraries  Frameworks for simulation and analysis  Projects common to several experiments  everything that is not an experiment-specific component is a potential candidate for a common project  long term advantages in use of resources, support, maintenance  Object persistency and data management

CERN Grid Technology in LCG  LCG expects to obtain Grid Technology from  projects funded by national and regional e-science initiatives -- and  from industry  concentrating ourselves on deploying a global grid service

CERN A few of the Grid Projects with strong HEP collaboration US projects European projects Many national, regional Grid projects -- GridPP(UK), INFN-grid(I), NorduGrid, Dutch Grid, …

CERN Grid Technology in LCG This area of the project is concerned with  ensuring that the LCG requirements are known to current and potential Grid projects  active lobbying for suitable solutions – influencing plans and priorities  evaluating potential solutions  negotiating support for tools developed by Grid projects  developing a plan to supply solutions that do not emerge from other sources  BUT this must be done with caution – important to avoid HEP-SPECIAL solutions important to migrate to standards as they emerge (avoid emotional attachment to prototypes)

CERN LCG Grid Technology Organisation grid technology manager STAG strategic technical advisory group GAG grid applications group requirements consultation consultation US projects negotiation deliverables Associated national, regional Grid projects -- GridPP(UK), INFN- grid(I), NorduGrid, Dutch Grid, … negotiation deliverables recommendations

CERN Grid Technology Status  A base set of requirements has been defined (HEPCAL)  43 use cases  ~2/3 of which should be satisfied ~2003 by currently funded projects  Good experience of working with Grid projects in Europe and the United States  Practical results from testbeds used for physics simulation campaigns  LCG-1 Plan: (which will evolve)  VDT as the basis  EDG components provide higher level functionality

CERN Grid Technology – Next Steps  leverage the massive investments being made  proposals being prepared both in the EU and US  target: solid (re-)engineering of current prototypes  expect several major architectural changes before things mature

CERN Fabric Area  CERN Tier 0+1 centre  Automated systems management package – autonomic computing  Evolution & operation of CERN prototype – integrating the base LHC computing services into the LCG grid  Tier 1,2 centre collaboration  develop/share experience on installing and operating a Grid  exchange information on planning and experience of large fabric management  look for areas for collaboration and cooperation  use HEPiX as the communications forum  Technology tracking & costing  new technology assessment nearing completion (PASTA III)  re-costing of Phase II is being done in light of PASTA III re-assessment of experiment trigger rates, event sizes (LHCC)

CERN Grid Deployment Deploying a production service

CERN Deployment Goals for LCG-1  Production service for Data Challenges in 2H03 & 2004  Initially focused on batch production work  Experience in close collaboration between the Regional Centres  Must have wide enough participation to understand the issues,  Learn how to maintain and operate a global grid  Focus on a production-quality service  Robustness, fault-tolerance, predictability, and supportability take precedence; additional functionality gets prioritized  LCG should be integrated into the sites’ physics computing services – should not be something apart  This requires coordination between participating sites in: Policies and collaborative agreements Resource planning and scheduling Operations and Support

CERN Timeline for the LCG computing service LCG-1 used for simulated event productions principal service for LHC data challenges – batch analysis and simulation validation of computing models Stable 1 st generation middleware Developing management, operations tools Computing model TDRs Phase 2 TDR More stable 2 nd generation middleware Very stable full function middleware Acquisition, installation, commissioning of Phase 2 service (for LHC startup) Phase 2 service in production LCG-2 VDT, EDG tools building up to basic functionality LCG-3 validation of computing service

CERN The LHC Global Grid Service LCG-1 – First Pilot - Target July 2003 –  data replication, migration  sustained 24 X 7 service  including sites from three continents  several times the capacity of the CERN facility  and as easy to use  And then evolve to the LHC production service:  reliability, availability  add more sites, more capacity  service quality  performance, efficiency scheduling, data migration, data transfer  develop interactive services  migrate to de-facto standards as they emerge

CERN Elements of a Production LCG Service  Middleware:  Testing and certification  Packaging, configuration, distribution and site validation  Support – problem determination and resolution; feedback to middleware developers  Operations:  Grid infrastructure services  Site fabrics run as production services  Operations centres – trouble and performance monitoring, problem resolution – 24x7 globally  Support:  Experiment integration – ensure optimal use of system  User support – call centres/helpdesk – global coverage; documentation; training

CERN General Strategy  Use middleware, software, tools that exist  Developed by the various grid projects  Integrate these tools as needed, with a well-defined testing and certification process  Forge collaborations, common projects, agreements, to fill in the missing pieces, support, etc.  With grid development projects  With other deployment projects  With standards bodies (e.g. GGF)

CERN Middleware  Combined US and EU toolkits:  Now:  VDT EDG GLUE schema  This is being used to:  Set up the testing & certification, deployment process, support structures  Address issues of integration into regional centre production environments  End of April –  EDG 2.0 built using VDT as the basis, including GLUE schema  This is what will be used in the initial production service in July  This is significant – should allow inter-operation between EDG and VDT sites and LCG

CERN LCG-1 Deployment Strategy DateRegional CenterExperiment Pilot 1 start – Feb 1 015/2/03CERNAll 128/2/03CNAF, RALAll 230/3/03FNALCMS 315/4/03TaiwanAtlas 430/4/03KarlsruheAll 57/5/03IN2P3All 615/5/03BNLAtlas 721/5/03Russia(Moscow), Tokyo All LCG-1 Initial Public Service Start – July 1 Then, in parallel : tier 2’s using Tier 1’s as support Deploy to the Tier 1 centres:

CERN Resource Requests Resources – compute & storage Grid Infrastructure Services Grid Deployment Organisation Grid Deployment manager LCG security group grid infra- structure team Grid Deployment Board (GDB) LCG operations team experiment support team regional centre operations regional centre operations regional centre operations regional centre operations regional centre operations regional centre operations regional centre operations regional centre operations CERN-based teams policies, strategy, scheduling, standards, recommendations Grid Resource Coordinator ALICE ATLAS CMS LHCb security tools operations call centre grid monitoring Joint Trillium/ EDG/LCG testing team anticipated teams at other institutes LCG toolkit integration & certification core infra- structure

CERN Grid Deployment Board  Grid Deployment Board  representatives from the experiments and from each country with an active Regional Centre taking part in the LCG Grid Service  forges the agreements, takes the decisions, defines the standards and policies that are needed to set up and manage the LCG Global Grid Services  coordinates the planning of resources for physics and computing data challenges  Initial task was the detailed definition of LCG-1, the initial LCG Global Grid Service  included defining the set of grid middleware tools to be deployed, the deployment schedule, security model, operations and support model

CERN Certification and Testing  Will be an ongoing major activity of LCG  Part of what will make LCG a production-level service  Goals:  Certify/validate that middleware behaves as advertised and provides the required functionality (HEPCAL)  Stabilise and robustify middleware  Provide debugging, problem resolution and feedback to developers  Testing activities at all levels  Component/unit tests  Basic functional tests, including tests of distributed (grid) services  Application level tests – based on HEPCAL use-cases  Experiment beta-testing before release  Site configuration verification

CERN Certification & Testing  Certification process – agreed a common process with EDG  Have agreed joint project with VDT (US): VDT provide basic level (Globus, Condor) testing suites We provide higher level testing  Will also have applications-level testing – standard benchmarks as well as experiment beta-testing, and HEPCAL tests  Look at using common tools and frameworks (where it makes sense) NMI/VDT-LCG  Certification testbeds  Local “grid” at CERN  Extended to distributed test bed – U. Wisc. and others  Site verification  Also an essential component  Exception handling has not really been addressed at all …

CERN Test and Validation process WPs add unit tested code to CVS repository Run nightly build & auto. tests Grid certification Fix problems Application Certification Build system Certification Testbed ~40cpu Production Developers machines Certified public release for use by apps. 24x7 Build system Test Group WPs Bugzilla anomalies reports Unit TestBuildCertificationProduction Users Development Testbed ~15cpu Individual WP tests Integration Team Integration Office hours Overall release tests Tagged package Tagged release selected for certification Releases candidate Tagged Releases Releases candidate Certified Releases Certified release selected for deployment Appl. Representatives

CERN Packaging and distribution  Obviously a major issue for a deployment project  Want to provide a tool that satisfies needs of the participating sites,  Interoperate with existing tools where appropriate and necessary  Does not force solution on sites with established infrastructure  Solution for sites with nothing  Configuration is essential component  Essential to understand and validate correct site configuration  Effort will be devoted to providing configuration tools  Verification of correct configuration will be required before sites join LCG  Subject of a collaborative project

CERN LCG Operations  Responsible for operating and maintaining the grid infrastructure and associated services  Gateways, information services, resource broker etc. – i.e. grid specific services  Will be a coordination between teams at CERN and at Regional Centres  Responsible also for the VO infrastructure, Authentication and Authorisation services  Security operations – incident response etc.  Build Grid Operations Centre(s)  Performance and problem monitoring;  Troubleshooting and coordination with site operations, user support, network operations etc.  Accounting and reporting  Leverage existing experience/ideas  Assemble monitoring, reporting, performance, etc. tools

CERN Monitoring tools

CERN Security  GOAL: Do not want to make exceptions for LCG services – they must run integrated into a site infrastructure, and be subject to all usual security and good management procedures and policies  BUT: Initially, certain to need exceptions and compromises since until now most grid middleware has sidestepped security issues  THUS: We must have a sound security policy and an agreed plan that provides for these exceptions in the short term, but shows a clear path to reach the state that the sites require  This area represents a significant effort and must address many issues:  VO management  Usage agreements – brings up legal issues, privacy, …  Incident response  Auditing

CERN Support Activities  Essential for a production level service  Experiment integration and consultancy  Support for data challenges  Ensure optimal use of resources, ensure experiment applications use middleware optimally  Middleware support – problem determination, resolution, feedback to developers  Call centres: 24x7 support, single point of contact  User support for expert users  Coordination of local support activities  Documentation  Training  Collaborate with operations centres, local user support (helpdesks)

CERN Future Strategy  Many LCG sites  Participate in other grids  Provide resources for other HEP experiments  Provide resources for other sciences  LCG cannot exist in isolation  Must collaborate on standards, projects and implementations of mutual benefit  Essential to benefit from experience of currently running experiments trying to use grid services

CERN Deployment Summary  Deploy middleware to support essential functionality, but goal is to evolve and incrementally add functionality  Added value is to robustify, support and make into a 24x7 production service  How?  Certification & test procedure – tight feedback to developers must develop support agreements with grid projects to ensure this  Define missing functionality – require from providers  Provide documentation and training  Provide missing operational services  Provide a 24x7 Operations and Call Centre Guarantee to respond Single point of contact for a user  Make software easy to install – facilitate new centres joining  Deployment is a major activity of LCG  Encompasses all operational and practical aspects of a grid  There is a lot of work already done that must be leveraged  Many opportunities for synergy and collaboration

CERN Conclusions  Moving from development to production is difficult  Requires a lot of detailed work – needs significant investment  There is a growing body of experience that must be built upon  There is a good chance now to build common toolkits, share developments, and work on certification, packaging etc.  We are forced to interoperate with other HENP experiments, other science applications: LCG cannot exist in isolation  But this is a good thing, although it makes life harder initially