Presentation on theme: "GridPP3 David Britton 6/September/2006. D. Britton6/September/2006GridPP3."— Presentation transcript:
GridPP3 David Britton 6/September/2006
D. Britton6/September/2006GridPP3 Overview The GridPP3 proposal consists of a 7-month extension to GridPP2, followed by a three year GridPP3 project starting in April GridPP2+ (7 month extension from September 2007 to March 2008) - Early approval sought in order to ensure staff retention. - Provides continuity of management and support over the LHC start-up. - Aligns the project with (a) financial year; (b) EGEE and other EU projects. GridPP3 (3 year project from April 2008 to March 2011) - From production to exploitation. - Delivers large-scale computing resources in a supported environment. - Underpins the success of the UK contribution to the LHC.
D. Britton6/September/2006GridPP3 Global Context EDG EGEE-IEGEE-II LHC Data Taking GridPP1 GridPP2GridPP3 EGI ? GridPP EDG EGEE LCG ( Many) Evolving standardsDeveloping requirements Changing Costs and budgets Experience wLCG
D. Britton6/September/2006GridPP3 WLCG MoU 17 March 2006: PPARC signed the Memorandum of Understanding with CERN Commitment to UK Tier-1 at RAL and the four UK Tier-2s to provide services and resources Current MoU signatories: China France Germany Italy India Japan Netherlands Pakistan Portugal Romania Taiwan UK USA Pending signatures: Australia Belgium Canada Czech Republic Nordic Poland Russia Spain Switzerland Ukraine
D. Britton6/September/2006GridPP3 Aim: by 2008 (full years data taking) -CPU ~100MSi2k (100,000 CPUs) -Storage ~80PB - Involving >100 institutes worldwide -Build on complex middleware in Europe (Glite) and in the USA (VDT) 1.Prototype went live in September 2003 in 12 countries 2.Extensively tested by the LHC experiments in September sites, 13,797 CPUs, 5PB storage in September active sites, 26,527 CPUs, 10PB storage in September 2006 Grid Overview
D. Britton6/September/2006GridPP3 Tier-0 to Tier-1 worldwide data transfers > 950MB/s for 1 week peak transfer rate from CERN of >1.6GB/s Ongoing experiment transfers as part of current service challenges
D. Britton6/September/2006GridPP3 Tier-1 to Tier-2 UK data transfers >1000Mb/s for 3 days peak transfer rate from RAL of >1.5Gb/s Require high data rate transfers ( Mb/s) to/from RAL as a routine activity
D. Britton6/September/2006GridPP3 Its in use: Active User s by LHC experiment ALICE (8) CMS (150) ATLAS (70) LHCb (40)
D. Britton6/September/2006GridPP3 Tier Centres
D. Britton6/September/2006GridPP3 LHC Hardware Requirements ALICE: Based on UK M&O author fraction (1.2%). ATLAS: Based on UK fraction of Tier-1 Authors. CMS: Based on a threshold size for a minimum viable Tier-1. LHCb: Based on Authorship fraction (16.5%) and number of Tier-1s. Overall resource level reviewed by LHCC. Balance of CPU, Storage, and Network driven by computing models.
D. Britton6/September/2006GridPP3 Non-LHC Hardware Requirements BaBar: Included explicitly, based on well understood resource requirement per fb -1 and the expected luminosity profile up to October Level is ~15% of Tier-1 CPU and Tape, and 9% Disk in UKQCD: Request received after planning stage completed so not included in the model. (Some uncertain whether UKQCD will move to LCG-based Grid and how manpower would be funded). Level is 3%-4% of Tier-2 resources and ~7% of Tier-1 tape in Others: The requirements of other, smaller, user groups and some provision for future larger groups (LC, Neutrino) where the requirements are currently largely unknown, have been addressed with a 5% envelope allocation of Tier- 2 Disk and CPU, and Tier-1 Tape.
D. Britton6/September/2006GridPP3 Budget Overview
D. Britton6/September/2006GridPP3 Tier-1 Centre Defined by the experiment hardware requirements, the experiment computing models, a hardware costing model, and by the service levels defined in the international MOU signed by PPARC Estimated Tier-1 peak data flows in 2008 [MB/s]
D. Britton6/September/2006GridPP3 Tier-1 Centre: Service Level
D. Britton6/September/2006GridPP3 Tier-1 Centre: Staff Core services refer to user-file systems, monitoring, software deployment and conditions database. Operations refers to machine-room environment, hardware diagnostics/repair, automation, fabric management, tape- movement etc. Incident Response Unit addresses MOU service requirement including out-of-hours call out.
D. Britton6/September/2006GridPP3 Tier-2 Centres GridPP has successfully developed four distributed Tier-2 Centres which have: - Engaged the institutes; - Levered large amounts of resources; - Developed local expertise; - Stimulated cross-disciplinary relationships; - Help promote the Grid, GridPP, Particle Physics, and the local groups within the universities. Successes: Development of regional management structure; MOU signed by each institute with GridPP; deployment of complex middleware; accounting; security; data-transfers; all fully operational and contributing to LCG.
D. Britton6/September/2006GridPP3 Tier-2 Centres To match the LHC computing models around 50% of the UK computing resources will be located at the Tier2s. Service levels are not as demanding as at the Tier-1. Distributed nature of the UK Tier-2 has technical advantages (divide and conquer) and technical drawbacks (inefficiencies). Importance of political/social aspects should not be underestimated.
D. Britton6/September/2006GridPP3 Tier-2 Market Model 1)Assume all Institutes involved are interested in building on their current contribution so that… 2)Effectively a market exists to provide Tier-2 resources to HEP (because many Institutes have dual-funding opportunities and/or internal reasons to be involved). 3)GridPP offers a market-price for Tier-2 resources which institutes may or may not chose to accept. 4)The market price is adjusted to optimise resources obtained. 5)The market price is bounded by what it would cost to provision the resources at the Tier-1. Inefficiencies associated with the distributed nature of the Tier-2s may be balanced by an increase in competition/leverage.
D. Britton6/September/2006GridPP3 Tier-2 Hardware Allocations Constrained by the requirement for Institutional JeS forms GridPP made an initial mapping (or allocation – i.e. not quite the market approach intended) of Tier-2 hardware. Allocations based on past-delivery; current size; and size of the local community of physicists. Fraction of Experiment allocated to each Tier-2 Relative fraction of Experiment allocated to each Institute within the Tier-2
D. Britton6/September/2006GridPP3 Tier-2 Staff Allocations GridPP currently funds 9 FTE at 17 institutes. In GridPP3, this is proposed to increase to FTE (c.f. Tier-1 has 18 FTE funded by GridPP3 for a comparable amount of hardware). Again, in this market approach this is the effort (currently) offered and not an estimate of the full effort needed.
D. Britton6/September/2006GridPP3 Tier-2 Hardware Costs (Agreed by CB) CPU (KSI2K) Requirement Amount paid for Unit Cost£0.392k£0.312k£0.247k£0.175k£0.124k£0.087k Cost £k£612k£656k£740k£656k£553k£0k Total (inc Disk)£1,163k£1,295k£1,383k£1,282k£1,120k£0k Take requirement in following year 7560 divided by the lifetime in years (4.85 CPU, 3.9 Disk) = 1559 Multiply by the unit cost in that year £0.392k/KSI2K = £612k Similarly for disk. Up to institutes how they spend it (new kit, replacement kit, central services … )
D. Britton6/September/2006GridPP3 Tier-2 Resources Sanity Checks: 1)Compare cost to GridPP of hardware at the Tier-1 and Tier-2 integrated over the lifetime of the project. 2)Total cost to project: Can compare (Staff + Hardware) cost of the Tier-2 facilities with the cost to the project of placing the same hardware at the Tier-1 (assuming that doubling the Tier-1 hardware requires a 35% increase in staff). Tier-1 Tier-2 CPU (K£/KSI2K-year): DISK (K£/TB-year): TAPE (K£/TB-year):0.052 Including staff and hardware, the cost of the Tier-2 facilities is ~80% of cost of an enlarged Tier-1.
D. Britton6/September/2006GridPP3 Budget Overview
D. Britton6/September/2006GridPP3 Grid Support Refers to staff effort for the support of Middleware, Security and Networking areas in GridPP3. The emphasis is on a managed transition from middleware development to middleware support (operational and bug-fixing). Three criteria applied to guide prioritisation of areas for support: 1)Areas which are mission critical for the UK. 2)Areas which are viewed as established international obligations. 3)Areas which provide significant leverage to the obvious advantage of GridPP Background documents discuss areas in terms of: a)Operational Support b)Maintenance (bug-fixing) c)Development (phased out where practical).
D. Britton6/September/2006GridPP3 Grid Support Areas
D. Britton6/September/2006GridPP3 Grid Support Staff Evolution
D. Britton6/September/2006GridPP3 Grid Operations Team of 8.5 FTE consisting of: - 1 Production Manager; - 4 Tier-2 Coordinators; - 3 to run the UK/GridPP Grid Operations Centre (GOC) FTE to coordinate technical documentation. Responsible for the deployment, operation, and support of UK Particle Physics environment. Production Manager is responsible for resolving technical and coordination issues that span the Tier1 and Tier2s and ensuring a stable production services with appropriate upgrades to improve functionality and quality. The current GOC (5.5 FTE funded by EGEE) is responsible for monitoring the world-wide Grid operations, providing trouble tickets, accounting services, and administrative tools.
D. Britton6/September/2006GridPP3 Operations Posts
D. Britton6/September/2006GridPP3 Budget Overview
D. Britton6/September/2006GridPP3 GridPP3 Structure Earth Wind Water Fire
D. Britton6/September/2006GridPP3 ManagementContinuity TD DB SL SP
D. Britton6/September/2006GridPP3 Outreach Currently a Dissemination and an Events Officer (1.5 FTE). Instructions in the PPARC call include the statement: It is expected that a plan for collaboration with industry will be presented or justification if such a plan is not appropriate. Therefore, broaden mandate to include industrial liaison without increasing manpower but add 0.5 FTE to this area from current documentation officer to handle user documentation and web-site maintenance. Overall team of 2 FTE responsible for: -Dissemination activities (news, press-releases, liaison with partners, etc.) -Event organisation (demos, publicity, etc.) -Industrial liaison (to be developed.) -Basic user documentation and website maintenance.
D. Britton6/September/2006GridPP3 GridPP3 Posts
D. Britton6/September/2006GridPP3 Travel and Other Costs Based on experience in GridPP2 we have budgeted £3.5k per FTE per annum for travel, a reduction of about 10%, to cover collaboration meetings, national and international conferences and workshops, technical meetings, management meetings, etc. Other Costs of £15k per annum have been included for outreach expenses and other operational expenses (licences, laptops, test machines, web server, software etc).
D. Britton6/September/2006GridPP3 Total Costs [k£]
D. Britton6/September/2006GridPP3 Risks
D. Britton6/September/2006GridPP3 Working Allowance and Contingency 15% of Tier-1 HW (cost uncertainties). (a) (b) 4 FTE at Tier-2 (market approach). (c) 15% of Tier-2 HW (cost uncertainties) + 15% (market approach). (d) 2 FTE at Tier-1`(service level). (e) 2 FTE at Tier-2`(service level).
D. Britton6/September/2006GridPP3 Total Project Cost
D. Britton6/September/2006GridPP3 Responses to Referee Questions
D. Britton6/September/2006GridPP3 Exclusivity? There is clearly a compelling advantage for the physicists concerned to be aligned with and pool resources with the rest of the global alliance that comprises LCG. However, this does not need to be an exclusive alliance. long-term operational costs, quality of service and interdisciplinary collaboration could surely be improved by a much more integrated and synergistic approach. GridPP has engaged with wider community (and has reported this to PPARC through RCUK annual reports) GridPPs first Grid application was GEANT-based for LISA Community is however focussed on its scientific priorities: LHC start-up timescale provides the primary focus
D. Britton6/September/2006GridPP3 Outsourcing? companies are developing expertise in service hosting and provision with many opportunities to develop experts, teams, resource management systems and operational/business knowledge. GridPP has engaged with BT (visits to hosting site in St Albans, meeting with BT management at IC) and discussed possibilities fully in the past. Recent IT outsourcing exercises at Bristol and Cambridge indicate that costs are prohibitive (but that these may be offset by a joint PR programme).
D. Britton6/September/2006GridPP3 Novel? Original? Timely? novelty is entirely inappropriate when the goal is a highly reliable, ubiquitous and always available e- Infrastructure similar undertakings of various scales are underway in many countries GridPP notes that many of the methods used have not been tested at the required scale The LHC is likely to start producing data by 2007 and the proposed e-Infrastructure must be ready by that date if UK PP is to benefit from that data.
D. Britton6/September/2006GridPP3 Relationships? the PP grid community has not yet engaged in collaboration on standardising data replication, data location, caching and reliable data movement services. Globus RLS was based on earlier collaboration with EDG, inc. GridPP input GridPP plans to include higher level replication services, built on current expertise
D. Britton6/September/2006GridPP3 Reliable methods? In house development of middleware and tools is almost certainly inappropriate GridPP agrees and, hence, the focus is on support and maintenance of existing components, with planned reductions in manpower Appendix A2 Middleware Support Planning Document expands upon the identified components as either mission critical to UK exploitation or as part of the UKs input in the wider international context or it is possible to demonstrate leverageMiddleware Support Planning Document
D. Britton6/September/2006GridPP3 Industrial relevance? significant technology transfer depends on long-term and sustained collaboration where mutual understanding develops and co-adaptation follows GridPP agrees: we are proposing a dedicated 0.5FTE in this area and believe this will represent good value at this level
D. Britton6/September/2006GridPP3 Viability? There is a significant risk that the gLite stack will prove incapable of development for large scale, wide-spread and production QoS use. It is already very complex.. GridPP agrees that there is a risk, but the expanded use of gLite across an ever- increasing infrastructure indicates that these problems are being overcome It is better than it was but it by no means free from risk and misdirection.
D. Britton6/September/2006GridPP3 Planning? The proposal states that A future EGI project, including particle physics as one of the leading applications, may have started. There are other future scenarios. One is the model already used in GÉANT.. GridPP agrees that e.g. UKERNA could have been asked to manage the Grid, but this is not currently planned Our intention is to (continue to) engage fully with the NGS and other bodies as discussed in appendix A7 National Context DocumentNational Context Document
D. Britton6/September/2006GridPP3 Planning? I would strongly recommend that a production e- Infrastructure project should not use bespoke software. GridPP agrees – the reference was to experiment- specific code that is currently necessary to fill gaps in the middleware It is essential to separate all forms of maintenance, especially bug fixing and improvements from operations and to conduct it in a software engineering environment with strict quality controls, testing and release procedures. GridPP agrees – the quality controls, testing and release procedures are of a high standard
D. Britton6/September/2006GridPP3 Planning? It is clear that a production service team should draw on others who should develop such services, not develop them themselves. … It is probably necessary to carry on some aspects of the above work, but these require very careful selection and they should be collaborative with other disciplines and grid projects, and include strategies where the development and maintenance is eventually handed over to others. GridPP agrees – in the GridPP3 proposal we discuss a very limited subset of maintenance and support developments that were proven to be necessary (and were effective) in the past or can be envisaged to be required in future c.f. Storage management is an area where there is already good international collaboration led by the PP community on standards and implementations using the SRM specifications
D. Britton6/September/2006GridPP3 Past effectiveness? The previous two GridPP projects have taken on demanding and challenging engineering, management and R&D tasks. They have been exceptionally successful, as establishing and running grid services on this scale requires world-leading innovation. This has required professional leadership and adept collaboration. There is plenty of evidence of their ability and the advent of LHC data will guarantee their motivation. Their particular strengths are in service management, deployment and operation on a global scale. GridPP agrees
D. Britton6/September/2006GridPP3 Suitability The two previous GridPP projects have demonstrated that they are capable of recruiting, sustaining and managing such a multi-site team. There is likely to be a substantial carry forward of the GridPP2 team. Can you quantify the level of continuity that the project depends on and the assessment of the risk that this continuity will not be met? GridPP agrees – there is a significant risk that the current expertise will be lost due to planning uncertainty. This was addressed in the proposal by the request for early approval of the GridPP2 continuation component.
D. Britton6/September/2006GridPP3 Reduce number of Tier-2 sites? It might be helpful to review carefully whether long-term savings can be made by concentrating Tier-2 resources over fewer sites. Currently table 10 shows 17 sites for Tier-2 resources. Is there really a case for resources at each of these sites? All institutes have delivered on their past MoU commitments (past performance was factored into the proposed sharing of Tier-2 resources) If PPARC chose to invest at a small subset of sites, then significant long-term buildings and infrastructure investment would be required (that has not been planned) In addition the utility costs of these would be exposed (currently hidden) If PPARC chose to select a larger subset of sites, there would be limited gains Possibly leveraging SRIF funding is a consideration.
D. Britton6/September/2006GridPP3 Cost-effectiveness matching funding is not a justification (for 7-month GridPP2 continuation in the context of EGEE-II) The main case is built upon GridPP2 completing its mission to establish a Production Grid, prior to LHC data-taking mode This enables retention of key staff whilst planning for the Exploitation phase in GridPP3
D. Britton6/September/2006GridPP3 Code efficiency improvements? How do you trade between investing in software engineering to improve code performance against investing in more CPU? LHC experiment codes are already highly optimised for the complex data analysis required There is significant investment in the optimisation effort within the experiments and the requirements take into account future optimisations The optimisations take account the (distributed) Grid computing constraints
D. Britton6/September/2006GridPP3 Usage increases? use by a much larger community intent on individual analyses requires further justification. How do you demonstrate this community will actually engage and actually generate this additional use? The experiment requirements anticipate increasing analysis across the experiments This is quantified by experiment in the proposal appendices 2.ALICE Computing Technical Design Report, lhcc pdf, 114pp.lhcc pdf 3.ATLAS Computing Technical Design Report, lhcc pdf, 248pp.lhcc pdf 4.CMS: The Computing Project Technical Design Report, lhcc pdf, 169pp.lhcc pdf 5.LHCb Computing Technical Design Report, lhcc pdf, 117pp.lhcc pdf
D. Britton6/September/2006GridPP3 Data Management? Companies such as Oracle and IBM supply well-honed distributed database technologies capable of high volume and high throughput. Developing PP-specific and home grown solutions is very unlikely to be cost effective. Oracle are fully incorporated into LCG planning, with (low cost) Worldwide Oracle database services used for core metadata functions
D. Britton6/September/2006GridPP3 Tier-2 additional support? Table 12 appears to identify an anomaly that suggests that the plan is not as cost effective as it should be. Tier-2 support effort is currently cross-subsidised through: 1.the PP rolling grant programme; 2.Institute (e.g. computing service) support Component 1 was anticipated not to be viable Component 2 was modest, but is expected to continue at ~this level We have requested Contingency to cover the possibility that component 2 is not preserved (15% on the hardware cost in addition to another 15% that covers the future price uncertainty; plus an additional 4 FTE - 1 at each Tier-2) We have also requested Working Allowance of an additional 2 FTE at Tier-2s to be used if the service level falls short
D. Britton6/September/2006GridPP3 Context Planning? The development of this interdependency and cooperation should be explicitly planned and specified from the start of GridPP3. e.g. forms part of the National e-Infrastructure – what part? CA LCG uses one system… training What source of training is this? All plans are integrated with NGS and EGEE in these areas and expanded upon in appendix A7 National Context Document National Context Document
D. Britton6/September/2006GridPP3 Overall Scientific Assessment This proposal is fundable and should be funded. Because of its significance to an extensive research community a decision to proceed should be made quickly. GridPP agrees The outline answers provided to the referees questions are provided in anticipation of such a PPRP decision
D. Britton6/September/2006GridPP3 Referee 2 Proposal Details: Reference number: PP/E00296X/1, Grant panel: Projects peer review panel, Grant type: Standard. The Proposal: Science quality: I really cannot comment on the pure science, not being a particle physicist. The proposal itself deals with deploying and operating a production GridPP, and as such is mostly infrastructural engineering and computer science of a software engineering flavour, rather than pure research. This is as it should be for a proposal of this type. In this sense the proposal is of a high quality. It is of course worthwhile in that it will be impossible for the UK particle physics community to fully engage with the LHC without GridPP3. Objectives: The grand objectives are clear enough in the executive summary, the more detailed objectives are distributed throughout the proposal, and perhaps could benefit from a summary tabulation. The objectives are sound but ambitious to an extent that perhaps threatens availability. Management: Based on GridPP2, appears to work well. Program Plan: Timescales & milestones hard to find. Significance: This is a very significant infrastructure for the future of particle physics in the UK. c/f Other Work: GridPP has performed very well in the EU context, and also in experimental transatlantic work, and is a central partner in EGEE. The proposed infrastructure is a part of an overall global grid required for LHC. Methodology: A continuation and expansion from GridPP2, and likely to be successful if the manpower resources are adequate to the task. Industry: Limited proposals. Planning: The related planning documents exhibit a good degree of coherency. Past Record: The past performance has been good to excellent. Suitability: Very suitable.
D. Britton6/September/2006GridPP3 Project Plan? Timescales & milestones hard to find. The intention is to use the project management methods used (successfully) in GridPP1 and GridPP2 The approach taken to GridPP3 is different to that of GridPP1(2) planning A set of high-level deliverables can be prepared in the light of PPRP feedback, if requested
D. Britton6/September/2006GridPP3 Backup Slides
D. Britton6/September/2006GridPP3 GridPP2 ProjectMap
D. Britton6/September/2006GridPP3 Convergence with NGS - The slow emergence of real web-services solutions means that will probably not be completed during GridPP2. - GridPP is committed to gLite and NGS intends to be compatible with this but can not deploy the full gLite stack. - GridPP collaboration is discussing formal affiliation with NGS and presently Edinburgh are NGS affiliates and Oxford, RAL, Manchester, and Lancaster are partners. Discussions underway with Glasgow, UCL, and IC.
D. Britton6/September/2006GridPP3 In the Beginning… The UK Grid for HEP really started to grow in 2000 with the release of the Hoffman report into LHC computing requirements and the results of the UK Government Spending Review (SR2000) which targeted £80m for e-Science. £80m Collaborative projects Generic Challenges EPSRC (£15m), DTI (£15m) Industrial Collaboration (£40m) Academic Application Support Programme Research Councils (£74m), DTI (£5m) PPARC (£26m) BBSRC (£8m) MRC (£8m) NERC (£7m) ESRC (£3m) EPSRC (£17m) CLRC (£5m)
D. Britton6/September/2006GridPP3 Hardware Costs Kryders Law for disk cost Moores Law for CPU cost Hardware costs extrapolated from recent purchases. However, experience tells us there are fluctuations associated with technology steps. Significant uncertainty in integrated cost. Model must factor in: - Operational life of equipment - Known operational overheads - Lead time for delivery and deployment.
D. Britton6/September/2006GridPP3 Hardware Costs: Tape
D. Britton6/September/2006GridPP3 Running Costs (Work in progress)
D. Britton6/September/2006GridPP3 Tier-1 Growth Now Start of GridPP3 End of GridPP3 Spinning Disks~2000 ~10,000~20,000 Yearly disk failures ? ? CPU Systems~550 ~1800 ~2700 Yearly system failures ? ? To achieve the levels of service specified in the MOU, a multi-skilled incident response unit (3 FTE) is proposed. This is intended to reduce the risk of over- provisioning other work areas to cope with long term fluctuations in fault rate. These staff will have an expectation that their primary daily role will be dealing with what has gone wrong. They will also provide the backbone of the primary callout team.
D. Britton6/September/2006GridPP3 Tier-2 Allocations Take each experiments CPU and Disk requirements (from Dave Newbold) For each experiment – share out among Tier-2s For each Tier-2 share out among institutes Sum over experiments (maintains the correct CPU/Disk ratio) Sharing guided by: Size of local community (number of Ac/Ph/PP) Past delivery (KSI2K to date, Disk usage last quarter) Current resources available
D. Britton6/September/2006GridPP3 Tier-2 Shares Physicists FTEs Existing Resources 1Q06Delivery to date Disk used 1Q06Summary Tier-2LHC OnlyKSI2KTB KSI2K Hrs TB MinMaxAve London4026% %1,348,23639%17.921% 39%28% NorthGrid3322% %1,229,27136%34.240%22%48%36% ScotGrid149% %187,4435%21.024%5%24%12% SouthGrid6643% %661,08019%13.415% 43%23% Total ,426, ~35% ~10% ~20%
D. Britton6/September/2006GridPP3 Example ATLASCMSLHCbOther London NorthGrid ScotGrid SouthGrid CMS Requirement in 2008 is 1800 KSI2K and 400 TB Tier-2 sharing matrix: Institute sharing matrix: ATLASCMSLHCbOther Brunel Imperial QMUL RHUL UCL i.e Imperial allocation is 1800 KSI2K (400 TB) x 0.75 x 0.9 = 1215 KSI2K (270 TB) (PMB/Tier-2 Board) (Tier-2 Board)
D. Britton6/September/2006GridPP3 Crosscheck:
D. Britton6/September/2006GridPP3 Tier-2 Staff
D. Britton6/September/2006GridPP3 Proposal Procedure Tier-1£Am Tier-2£Bm Middleware£Cm Applications£Dm Management£Em … Total £Xm Proposal Tier-1£am Tier-2£bm Middleware£cm Applications£dm Management£em … Total £YM Re-evaluation Institute 1£fm Institute 2£gm Institute 3£hm Institute 4£im Institute 5£jm … Total £YM Apply for Grants Peer Review GridPP1/ GridPP2 GridPP3 Peer Review Tier-1£Am Tier-2£Bm Middleware£Cm Applications£Dm Management£Em … Total £Xm Proposal Institute 1£Fm Institute 2£Gm Institute 3£Hm Institute 4£Im Institute 5£Jm … Total £XM Allocate Institute 1£fm Institute 2£gm Institute 3£hm Institute 4£im Institute 5£jm … Total £YM Is this still a sensible project?
D. Britton6/September/2006GridPP3 GridPP3 Deployment Board In GridPP2, the Deployment Board is squeezed into a space already occupied by the Tier-2 Board; the D-TEAM; and the PMB. Many meetings have been joint with one of these other bodies. Identity and function have become blurred. Project Management Board X In GridPP3, propose a combined Tier-2 Board and Deployment Board with overall responsibility for deployment strategy to meet the needs of the experiments. In particular, this is a forum where providers and users formally meet. Deals with: 1) Issues raised by the Production Manager which require strategic input. 2) Issues raised by users concerning the service provision. 3) Issues to do with Tier-1 - Tier-2 relationships. 4) Issues to do with Tier-2 allocations, service levels, performance. 5) Issues to do with collaboration with Grid Ireland and NGS.
D. Britton6/September/2006GridPP3 GridPP3 DB Membership 1) Chair 2) Production Manager 3) Technical Coordinator 4) Four Tier-2 Management Board chairs. 5) Tier-1 Board Chair. 6) ATLAS, CMS, LHCb representatives. 7) User Board Chair. 8) Grid Ireland representative 9) NGS representative. 10) Technical people invited for specific issues. Above list gives ~13 core members, 5 of whom are probably on PMB. There is a move away from the technical side of the current DB and it becomes a forum where the deployers meet each other and hear directly from the main users. The latter is designed to ensure buy-in by the users to strategic decisions.
D. Britton6/September/2006GridPP3 Grid Data Management Operational Support: FTS; metadata catalogues as they are deployed; replica optimisation services eventually. Maintenance: Metadata services and eventually replica optimisation services. Development: Common metadata services; Replica optimisation. Components: File transfer services. Metadata Catalogues. Services to manage the replication of data.
D. Britton6/September/2006GridPP3 Storage Management Operational Support: All above components. Hope to reduce number. Maintenance: GridPP owns dCache installation and configuration scripts within LCG, and the SRM2 interface to CASTOR. Development: None envisaged in GridPP3 era. However, SRM version-3 may impose some requirements Components: DPM (used at 12 Tier-2 sites in UK) dCache (used at Tier-1 and 7 Tier-2 sites in UK) CASTOR SRM1 (Tier-1 but to be phased out in 2006) CASTOR SRM2 (Tier-1 - primary developer).
D. Britton6/September/2006GridPP3 Information and Monitoring Operational Support: R-GMA Maintenance: R-GMA and SD. Development: R-GMA may still require development at start of GridPP3. Glue schema likely to require ongoing development (minor effort). Components: R-GMA (information system slated to replace the BDII) Service Discovery (SD) APEL accounting (uses R-GMA) GLUE Schema (information model to define Grid resources)
D. Britton6/September/2006GridPP3 Workload, Performance and Portal Operational Support: WMS, Job information repository. Job information analysis. Maintenance: WMS-testing, Job information scripts, RTM, Portal. Development: Portal (to address needs of new users); Job information scripts (to enrich/optimise content); (Possibly RTM if evolution still required/desired). Components: WMS (Resource Broker, Logging & Bookkeeping server etc). Tools to gather job information (used by ATLAS, CMS, and the RTM). Real Time Monitor (RTM). GridPP Portal.
D. Britton6/September/2006GridPP3 Security Operational Support: GridSite and VOMS. Operational Security Officer Post. International Security Coordination Post. Maintenance: GridSite Development: GridSite Components: - GridSite Toolkit (includes Grid Access Control Language GACL and GridSites Apache extension mod_gridsite both used by ATLAS and CMS) - VOMS
D. Britton6/September/2006GridPP3 Networking Operational Support: Network monitoring and diagnostics. Maintenance: Minor. Development: None. Components: - High level contacts with JISC and UKERNA. - Requirements and provisioning - Work with providers in respect to interfaces to Grid – Network operations. - Network monitoring and diagnostic tools.
D. Britton6/September/2006GridPP3 Active Users (All VOs)
D. Britton6/September/2006GridPP3 Active Users by LHC experiment ALICE (8) CMS (150) ATLAS (70) LHCb (40)
D. Britton6/September/2006GridPP3 Job success? Overview
D. Britton6/September/2006GridPP3 Job Success by LHC experiment ALICE CMS ATLAS LHCb