D. Britton30/June/2006GridPP3 Proposal Procedure Tier-1£Am Tier-2£Bm Middleware£Cm Applications£Dm Management£Em … Total £Xm Proposal Tier-1£am Tier-2£bm Middleware£cm Applications£dm Management£em … Total £YM Re-evaluation Institute 1£fm Institute 2£gm Institute 3£hm Institute 4£im Institute 5£jm … Total £YM Apply for Grants Peer Review GridPP1/ GridPP2 GridPP3 Peer Review Tier-1£Am Tier-2£Bm Middleware£Cm Applications£Dm Management£Em … Total £Xm Proposal Institute 1£Fm Institute 2£Gm Institute 3£Hm Institute 4£Im Institute 5£Jm … Total £XM Allocate Institute 1£fm Institute 2£gm Institute 3£hm Institute 4£im Institute 5£jm … Total £YM Is this still a sensible project?
D. Britton30/June/2006GridPP3 Life after GridPP2 We propose a 7-month transition period for GridPP2, followed by a three year co-development programme with the LHC Computing Grid, the proposed European Grid Infrastructure (EGI), the Particle Physics experiments and the Institutes. The GridPP3 project, a continuation of GridPP, will deliver a full- scale Grid for Exploitation to meet the reconstruction, simulation and analysis requirements of experiments across the Particle Physics programme. Timeframe: GridPP2+ Sep 07 to Mar 08 GridPP3 Apr 08 to Mar 11 Budget: Has not been pre-specified… input to exploitation review was £36.6m for this period which is clearly (above) the upper limit.
D. Britton30/June/2006GridPP3 GridPP2+ In the 7 Month period from Sep-07 to Apr-08 we propose (following the suggestion of the Oversight Committee) to continue the GridPP2 project largely as-is primarily in order to: 1)To sort out issues with the time-frame for the PPRP process and post- extension in Sep 07. 2)Provide continuity of management and support over the expected start-up phase of the LHC. 3)Align future projects with financial years, with EGEE and possible future EGI project, and with other grants in the UK. Proposal is to continue all GridPP2 posts in this period except for the Application posts (which have been applied for via the Rolling Grant mechanism). We hope (need) to use the GridPP2+ period to install/commission a substantial pulse of hardware to be ready for the start of the LHC.
D. Britton30/June/2006GridPP3 Earth Wind Water Fire
D. Britton30/June/2006GridPP3 Proto-GridPP3 PMB CB Chair............ Project Leader........ Deputy Project Leader... Project Manager....... Deployment Board Chair.. Technical Coordinator... User Board Chair....... LCG Liaison.......... CERN Liaison......... EU Liaison........... Budget Holder........ Network Liaison....... NGS Liaison.......... Production Manager..... Outreach............ Steve Lloyd or Replacement Dave Britton (John Gordon) Sarah Pearce Steve Lloyd Tony Doyle (John Gordon) (Tony Cass) (Robin Middleton) (Pete Clarke) (Jeremy Coles) Sarah Pearce Dave Kelsey
D. Britton30/June/2006GridPP3 GridPP3 Deployment Board In GridPP2, the Deployment Board is squeezed into a space already occupied by the Tier-2 Board; the D-TEAM; and the PMB. Many meetings have been joint with one of these other bodies. Identity and function have become blurred. Project Management Board X In GridPP3, propose a combined Tier-2 Board and Deployment Board with overall responsibility for deployment strategy to meet the needs of the experiments. In particular, this is a forum where providers and users formally meet. Deals with: 1) Issues raised by the Production Manager which require strategic input. 2) Issues raised by users concerning the service provision. 3) Issues to do with Tier-1 - Tier-2 relationships. 4) Issues to do with Tier-2 allocations, service levels, performance. 5) Issues to do with collaboration with Grid Ireland and NGS.
D. Britton30/June/2006GridPP3 GridPP3 DB Membership 1) Chair 2) Production Manager 3) Technical Coordinator 4) Four Tier-2 Management Board chairs. 5) Tier-1 Board Chair. 6) ATLAS, CMS, LHCb representatives. 7) User Board Chair. 8) Grid Ireland representative 9) NGS representative. 10) Technical people invited for specific issues. Above list gives ~13 core members, 5 of whom are probably on PMB. There is a move away from the technical side of the current DB and it becomes a forum where the deployers meet each other and hear directly from the main users. The latter is designed to ensure buy-in by the users to strategic decisions.
D. Britton30/June/2006GridPP3 LHC Hardware Requirements GridPP Exploitation Review input: Took Global Hardware requirements and multiplied by UK authorship fraction. ALICE 1%ATLAS 10%CMS 5%LHCB 15% Problematic using Authors in the denominator when not all Authors (globally) have an associated Tier-1. Such an algorithm applied globally would not result in sufficient hardware. GridPP has asked the experiments for requirements and their input (relative to their global requirements) is: ALICE ~1.3%ATLAS ~13.7%CMS ~10.5%LHCb ~16.8% ?? (Global Requirements) X (Global T1 author frac.) (Global Requirements) (Number of Tier1s) ~50% X (Global Requirements) (Number of Tier1s) ~ UK Authorship fraction
D. Britton30/June/2006GridPP3 Proposed Hardware The input from the User Board was that that the hardware requirements in the GridPP3 proposal should be: Those defined by the LHC experiments; plus those defined by BaBar (historically well understood); plus a 5% provision for Other experiments at the Tier-2s only.
D. Britton30/June/2006GridPP3 Hardware Costs Kryders Law for disk cost Moores Law for CPU cost Hardware costs extrapolated from recent purchases. However, experience tells us there are fluctuations associated with technology steps. Significant uncertainty in integrated cost. Model must factor in: - Operational life of equipment - Known operational overheads - Lead time for delivery and deployment.
D. Britton30/June/2006GridPP3 Hardware Costs: Tape
D. Britton30/June/2006GridPP3 Tier-2 Allocations Take each experiments CPU and Disk requirements (from Dave Newbold) For each experiment – share out among Tier-2s For each Tier-2 share out among institutes Sum over experiments (maintains the correct CPU/Disk ratio) Sharing guided by: Size of local community (number of Ac/Ph/PP) Past delivery (KSI2K to date, Disk usage last quarter) Current resources available
D. Britton30/June/2006GridPP3 Tier-2 Shares Physicists FTEs Existing Resources 1Q06Delivery to date Disk used 1Q06Summary Tier-2LHC OnlyKSI2KTB KSI2K Hrs TB MinMaxAve London4026%1049.037.727%1,348,23639%17.921% 39%28% NorthGrid3322%1783.1132.248%1,229,27136%34.240%22%48%36% ScotGrid149%354.044.610%187,4435%21.024%5%24%12% SouthGrid6643%516.448.415%661,08019%13.415% 43%23% Total152 3702.5262.9 3,426,030 86.6 ~35% ~10% ~20%
D. Britton30/June/2006GridPP3 Example ATLASCMSLHCbOther London0.250.750.100.30 NorthGrid0.500.000.200.40 ScotGrid0.150.000.300.10 SouthGrid0.100.250.400.20 CMS Requirement in 2008 is 1800 KSI2K and 400 TB Tier-2 sharing matrix: Institute sharing matrix: ATLASCMSLHCbOther Brunel0.000.100.000.15 Imperial0.000.901.000.00 QMUL0.700.00 0.60 RHUL0.200.00 0.15 UCL0.100.00 0.10 i.e Imperial allocation is 1800 KSI2K (400 TB) x 0.75 x 0.9 = 1215 KSI2K (270 TB) (PMB/Tier-2 Board) (Tier-2 Board)
D. Britton30/June/2006GridPP3 Hardware Costs (Agreed by CB) CPU (KSI2K) 200720082009201020112012 Requirement 756010215145221820321708 Amount paid for15592106299437534476 Unit Cost£0.392k£0.312k£0.247k£0.175k£0.124k£0.087k Cost £k£612k£656k£740k£656k£553k£0k Total (inc Disk)£1,163k£1,295k£1,383k£1,282k£1,120k£0k Take requirement in following year 7560 divided by the lifetime in years (4.85 CPU, 3.9 Disk) = 1559 Multiply by the unit cost in that year £0.392k/KSI2K = £612k Similarly for disk. Up to institutes how they spend it (new kit, replacement kit, central services … )
D. Britton30/June/2006GridPP3 Similar procedure used to allocate manpower (by the Tier-2 Board) Crosscheck
D. Britton30/June/2006GridPP3 Tier-2 Resources In GridPP2 we paid for staff in return for provision of hardware, which is not a sustainable model. Need a transition to a sustainable model that generates sufficient (but not excessive) hardware, which institutes will buy into. Such a model should: Acknowledge that we are building a Grid (not a computer centre). That historically Tier2s have allowed us to lever resources/funding. That Tier2 are designed to provide different functions and different levels of service from the Tier1. Dual funding opportunities may continue for a while. Institutes may have strategic gain by continuing to be part of the "World's largest Grid"
D. Britton30/June/2006GridPP3 Tier-2 Hardware Model (for proposal) endorsed by CB: - GridPP funds ~15 FTE at the Tier-2s. - Tier-2 Hardware requirements are defined by the UB request. - That GridPP pays the cost of purchasing hardware to satisfy the following years requirements at the current year price, divided by the nominal hardware lifetime (~4 years for disk; ~5 years for CPU). E.g. 2253 TB of Disk is required in 2008. In January 2007, this would cost ~1.0k£/TB. With a life-time of 4 years, the 1-year value is 2253/4 = £563k. Note: This does not necessarily reimburse the full cost of the hardware because in subsequent years, the money GridPP pays depreciates with the falling cost of hardware, whereas the Tier2s who actually made a purchase, have been locked into a cost determined by the purchase date. However, GridPP does pay cost up to 1-year before the actual purchase date, and institutes which already own resources can delay the spend further.
D. Britton30/June/2006GridPP3 Tier-2 Resources Sanity Checks: 1)Can apply the model and compare cost of hardware at the Tier-1 and Tier-2 integrated over the lifetime of the project: 2)Total cost of ownership: Can compare total cost of the Tier-2 facilities with the cost of placing the same hardware at the Tier-1 (assuming that doubling the Tier-1 hardware requires a 35% increase in staff). Tier-1 Tier-2 CPU (K£/KSI2K-year):0.0700.045 DISK (K£/TB-year): 0.1440.109 TAPE (K£/TB-year):0.052 Including staff and hardware, the cost of the Tier-2 facilities is ~80% of cost of an enlarged Tier-1.
D. Britton30/June/2006GridPP3 Running Costs (Work in progress)
D. Britton30/June/2006GridPP3 Total Hardware Cost In addition to ~£1.6m GridPP2 money –likely to be problematic!
D. Britton30/June/2006GridPP3 Tier-1 Service Tier1 Centres provide a distributed permanent back-up of the raw data, permanent storage and management of data needed during the analysis process, and offer a grid-enabled data service. They also perform data- intensive analysis and re-processing, and may undertake national or regional support tasks, as well as contribute to Grid Operations Services.[LCG MoU] The exact role of the Tier-1 varies from experiment to experiment, and is provided in detail in the individual experiments TDRs. However broadly the Tier-1 will carry out the following tasks: acceptance of an agreed share of raw data from the Tier0 Centre, keeping up with data acquisition; acceptance of an agreed share of first-pass reconstructed data from the Tier0 Centre; acceptance of processed and simulated data from other centres of the WLCG; recording and archival storage of the accepted share of raw data (distributed back-up); recording and maintenance of processed and simulated data on permanent mass storage; provision of managed disk storage providing permanent and temporary data storage for files and databases; provision of access to the stored data by other centres of the WLCG … operation of a data-intensive analysis facility; provision of other services according to agreed Experiment requirements; ensure high-capacity network bandwidth and services for data exchange with the Tier0 Centre, as part of an overall plan agreed amongst the Experiments, Tier1 and Tier0 Centres; ensure network bandwidth and services for data exchange with Tier1 and Tier2 Centres, as part of an overall plan agreed amongst the Experiments, Tier1 and Tier2 Centres; administration of databases required by Experiments at Tier1 Centres. All storage and computational services shall be grid enabled according to standards agreed between the LHC Experiments and the regional centres. Tier-0Tier-1Tier-2 ALICEFirst-pass scheduled reconstruction Reconstruction On-demand analysis Central simulation On-demand analysis ATLASReconstruction Scheduled analysis / skimming Calibration Simulation On-demand analysis Calibration CMSReconstruction Scheduled analysis / skimming Simulation On-demand analysis Calibration LHCbReconstruction On-demand analysis Scheduled skimming Simulation
D. Britton30/June/2006GridPP3 Tier-1 Growth Now Start of GridPP3 End of GridPP3 Spinning Disks~2000 ~10,000~20,000 Yearly disk failures30-45 200-300? 400-600? CPU Systems~550 ~1800 ~2700 Yearly system failures35-40 120-130? 180-200? To achieve the levels of service specified in the MOU, a multi-skilled incident response unit (3 FTE) is proposed. This is intended to reduce the risk of over- provisioning other work areas to cope with long term fluctuations in fault rate. These staff will have an expectation that their primary daily role will be dealing with what has gone wrong. They will also provide the backbone of the primary callout team.
D. Britton30/June/2006GridPP3 Tier-1 Staff Work Area GridPP3 PPARC fundingCCLRC funding CPU2.00.0 Disk3.00.0 Tape Service (CASTOR)2.01.3 Core Services1.00.5 Operations3.01.0 Incident Response Unit3.00.0 Networking0.00.5 Deployment1.50.0 Experiment Support1.50.0 Tier-1 Management1.00.3 Totals18.03.6
D. Britton30/June/2006GridPP3 Tier-2 Service provision of managed disk storage providing permanent and/or temporary data storage for files and databases; operation of an end-user analysis facility; provision of other services, such as simulation, according to agreed Experiment requirements; provision of network services for data exchange with Tier1 Centres, as part of an overall plan agreed between the Experiments and the Tier1 Centres concerned. All storage and computational services shall be grid enabled according to standards agreed between the LHC Experiments and the regional centres. The following services shall be provided by each of the Tier2 Centres in respect of the LHC Experiments that they serve, according to policies decided by these Experiments: ServiceMaximum delay in responding to operational problems Average availability measured on an annual basis Prime timeOther periods End-user analysis facility2 hours72 hours95% Other services12 hours72 hours95%
D. Britton30/June/2006GridPP3 Grid Deployment Staff (Operations) Team of 8: A Production Manager; 4 Tier-2 Coordinators; 3 GOC staff. Their activities include: Resource and deployment planning and scheduling upgrades Installation and configuration of Grid middleware services Support of these Grid services Grid Operations User support System manager support Monitoring, accounting and auditing Security (both operational and policy aspects) Documentation VO management and support GOC Staff: APEL world wide accounting; GOC DB; ROC Manager (including GGUS support).
D. Britton30/June/2006GridPP3 Grid Support Staff
D. Britton30/June/2006GridPP3 GridPP Staff Evolution
D. Britton30/June/2006GridPP3 Dissemination 4. The bid (s) should : a) show how developments build upon PPARCs existing investment in e- Science and IT investment, leverage investment by the e-science Core programme and demonstrate close collaboration with other science and industry and with key international partners such as CERN. It is expected that a plan for collaboration with industry will be presented or justification if such a plan is not appropriate. For exploitation review it was assumed dissemination was absorbed by PPARC. Unlikely at this point! Presently we have effectively 1.5 FTE working on dissemination alone (Sarah Pearce plus events officer). Want to maintain a significant dissemination activity (insurance policy) so adding in industrial liaison suggests maintaining the level at 1.5 FTE.
D. Britton30/June/2006GridPP3 Full Proposal Compares with exploitation review input of £36,643k which included £1,800k running costs.
D. Britton30/June/2006GridPP3 Status GridPP3 proposal being drafted (deadline July 13 th ) Currently being run by CB (email) and OC (Friday) Request the Hardware defined by the experiments Request (minimum) staff we think are required Expect some iteration!