Presentation on theme: "LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002."— Presentation transcript:
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002
CERN Ian.Bird@cern.ch 2 M1.1 First Global Service Initial Availability July 2003 This comprises the construction and commissioning of the 1 st LHC computing service for physics usage. The service must offer reliably 24x7 availability to all 4 LHC experiments and include some 10 Regional Centres in Europe, North America, and Asia. The milestone includes delivery of the associated Technical Design, containing description of the architecture, functionality and quantified technical specifications of performance (capacity, throughput, reliability, availability). It must also include middleware specifications, agreed as a common toolkit by Europe and US. The service must prove functional, providing a batch service for event production and analysis of the simulated data set. For the milestone to be met, operation must be sustained reliably during a 7 day period; stress tests and user productions will be executed, with a failure rate below 1%.
CERN Ian.Bird@cern.ch 3 L2 milestones for M1.1 Define LCG-1 in terms of functionality, resources, operations, security, support Series of evolving pilot services for testing, with increasing resources Testing, certification, packaging and release of software Set up infrastructure and operational procedures Set up operations centre and help desk (call centre) LCG-1 commissioning and acceptance
CERN Ian.Bird@cern.ch 4 M1.1 (a) Define LCG-1 functionality, resources, operations, security The 5 working groups of the Grid Deployment Board will define LCG-1. 1.Functionality needed by the experiments for their data challenges; identify VDT, EDG components to provide it; negotiate support agreements with providers. 2.Resources and Regional Centres to participate in LCG-1; deployment schedule and resource ramp-up. Define resource request and review process. 3.Negotiate initial security model (authentication, authorization etc.) acceptable to all centres, provide a plan to achieve the full requirements of the centres. 4.Define operating procedures, negotiate agreements with centres to put these into place. 5.Define the user support model. WG1-4 will provide an interim report on Dec 9, final report Feb 1, 2003, WG5 3 months later
CERN Ian.Bird@cern.ch 5 M1.1 (b) Series of evolving pilot services for testing, with increasing resources Pilot-1 service – February 1, 2003. 50 machines (CE), 10 TB (SE). Runs middleware currently on LCG testbeds. Initial testbed at CERN. Add 1 remote site by February 28, 2003. Pilot-2 service – March 15, 2003. 100 machines (CE), 10 TB (SE). CERN service will run full prototype of WP4 installation and configuration system. Add 1 US site to pilot – March 30, 2003 Add 1 Asian site to pilot – April 15, 2003 Add 2-3 more EU and US sites – April – May, 2003 Service includes 6-7 sites – June 1, 2003 LCG-1 initial production system – July 2003. 200 machines (CE), 20 TB (SE). Uses full WP4 system with fully integrated fabric infrastructure. Global service has 6-7 sites in 3 continents.
CERN Ian.Bird@cern.ch 6 M1.1 (c) Testing, certification, packaging and release of software This is the process by which we make the service reliable and supportable (production service) Certification, testing, release process defined – January 2003. To verify functionality, robustness, etc. Essential to provide production service. Process defined for EDG, modify for LCG. Packaging/configuration mechanism defined– March 2003. Needed to automate installation and configuration. A collaborative activity LCG+grid projects. Requirements gathering in progress. Delivery of middleware software packages – March 1, 2003 This is delivery to LCG from the grid middleware providers Iterative, incremental release cycle, with major functional releases: V1.0 – June 1, 2003 V1.1 – October 1, 2003 Incremental releases to improve stability, robustness, fix problems.
CERN Ian.Bird@cern.ch 7 M1.1 (d) Set up Infrastructure & Operational procedures – January – June 2003 Schedule and details driven by outcome of GDB working groups Certificate Authorities and VO management systems in place – May 2003 Based on existing EU and US inter-operating systems Deploy grid services to participating sites As they come online – according to WG2 schedule Agreement on responsibilities for management of services This is the outcome from WG 4 – February 1, 2003 Resource accounting and reporting procedures set up – May 2003 Security procedures defined and agreed – June 2003 Incident response and security management
CERN Ian.Bird@cern.ch 8 M1.1 (e) Set up operations centre and help desk (call centre) Identify operations and call centre locations – February 1, 2003 A call centre to provide operational and helpdesk support Distributed across 2 sites initially to provide reasonable coverage Monitoring system based on tools used in testbeds and recent demonstrations Existing experience in Teragrid and iVDGL, DataTAG Needs a problem tracking database – several candidate systems In place by June 2003
CERN Ian.Bird@cern.ch 9 M1.1 (f) LCG-1 commissioning and acceptance – June 2003 30 day commissioning period with user productions and stress tests, including 7 day acceptance period
CERN Ian.Bird@cern.ch 10 M1.4 Fully Operational LCG-1 Service November 2003 This comprises the availability of LCG-1 as a fully operational and performant 24x7 production service. Operation must be sustained for a period of 1 month. This service would be used for the “5% data challenges” of the LHC experiments. LCG-1 will be operated continuously, evolving in terms of capacity, performance and functionality. It includes the addition of Regional Centres as they come on-line as defined in GDB Working Group 2 It include the delivery of the technical service specifications and user documentation, and deployment/consolidation of an appropriate user support infrastructure. It also includes incremental releases of middleware to improve reliability, robustness, and performance. The service level must be as required for the 2004 data challenges. The determination and acceptance of the milestone should be done with a review of the service by representatives of the experiments, regional centres, and LCG.
CERN Ian.Bird@cern.ch 11 L2 Milestones for M1.4 Define LCG-1 performance goals – July 2003 In concert with experiments and their data challenge requirements, set performance goals in terms of capacity, throughput, reliability, etc. A GDB working group. 10 Regional Centres participating – October 2003 WG2 defines the implementation schedule – may be adjusted in July. Add centres 1 at a time until October. LXBatch service merged into LCG-1 – October 2003 All resources of LXBATCH will be grid-enabled and accessible as part of the LCG-1 service. Milestone release of middleware – October 2003 V1.1 release with improved functionality – October 2003 Review of service – November 2003 The LCG-1 service level should be that required for the 2004 data challenges. The determination and acceptance of achieving the target will be done in a review of the service by representatives from the experiments, the regional centres and LCG.
CERN Ian.Bird@cern.ch 12 M1.6 Fully Operational LCG-3 Service January 2005 This comprises the construction and commissioning of a fully operational full-size prototype (LCG-3) of what will be the initial LHC computing production service. Operation must be sustained 24x7 reliably for a period of 1 month. LCG-3 will be used as a proof that the LHC computing model will work, including Tier 0,1,2 and 3 regional centres, providing practical backup for the computing service TDR. LCG-3 will use the LHC Grid toolkit, will have 50% of the components required for the 2007 production service of CMS or ATLAS, and will be used for the “20% milestones” of the experiments.
CERN Ian.Bird@cern.ch 13 L2 Milestones for M1.6 Define LCG-3 – February 2004 Functionality – middleware packages Resources, Regional Centre participants Performance goals LCG-3 pilot system available – July 2004 Operate in parallel with LCG-1 production service. Used for integration and functional tests by experiments. Decision on new batch system software (CERN) – December 2004 Following a review of scheduler software alternatives Upgrade LCG-1 service to LCG-3 December 2004 – January 2005. This is a major upgrade that can only be done at a quiet time.
CERN Ian.Bird@cern.ch 14 M1.8 Completion of the Computing Service TDR June 2005 The Computing Service TDR will specify the requirements for the Grid that will be used for the first production services for the four LHC experiments. It will include details of the architecture, functionality, capacity, performance, throughput and availability. It will include the Regional Centre plans that will have been developed to meet these requirements, and will provide cost estimates and an overall installation and verification schedule. It is assumed that the TDR will be approved by the LHCC within three months following its availability, and may be used to provide data for the Memorandum of Understanding for Phase 2 of the project. The full process from acquisition to service verification is expected to take 12-18 months (according to the administrative procedures of the Regional Centres). The initial service must be in full production by September 2006 (6 months before data taking). The TDR will therefore be approved after the acquisition procedures have started, but before orders are placed.
CERN Ian.Bird@cern.ch 15 L2 Milestones for M1.8 - TDR Complete proposals for NSF-ITR and EU-FP6 – April 2003 Programs at proposal stage to re-engineer, robustify, improve grid middleware Report on comprehensive reviews of grid technologies, define strategy for missing functionality – July 2003 Reviews to identify technology providers, capabilities and strategies for LCG-3. Includes a plan to provide functions not provided above. Review of status of progress – July 2004 Experiments’ final analysis models – December 2003 In the light of 1 st 6 months experience with LCG-1, the experiments should provide updated analysis models SC2 Review – December 2004 Comprehensive review of experience in the experiments and at the Regional Centres in deploying, operating, and using LCG services. Update the requirements and service model for deployment and operation of the final system.
CERN Ian.Bird@cern.ch 16 Timelines LCG-1 LCG Services Pilot-1 Pilot-2 Testbed LCG-3 LCG Certification & Test Incremental middleware releases Incrementally add regional centres ALICE 5%10% ATLAS DC-2 CMS 5%DC04DC05 10% LHCb Data Challenges July 03Jan 03Jan 04July 04Jan 05July 05 LCG-1 Defined LCG-1 Initial Service Available LCG-1 Full Service Available LCG-1 Fulfils Performance Goals LCG-3 Fulfils Performance Goals Computing TDR