Presentation on theme: "Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP25 Collaboration Meeting."— Presentation transcript:
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow GridPP25 Collaboration Meeting 26 th August 2010 Climbing Hills
Slide Grid Growth David Britton, University of Glasgow GridPP25 2 “Last Week” “Last Month” “Last Year” “Last 5 years”
Slide A Grid for All David Britton, University of Glasgow GridPP25 3 “Last Quarter”
Slide Step by Step David Britton, University of Glasgow GridPP25 4 “Last 5 years” “Next 5 years” GridPP1: “From Web to Grid” GridPP2: “From Prototype to Production” GridPP3: “From Production to Exploitation” GridPP4: “Computing In the LHC era” Finishing Line
Slide GridPP4 Time Line David Britton, University of Glasgow GridPP24 5 Nov 5 th – invitation to bid. Dec 10 th – Face-to-face PMB to agree structure Dec 11 th – CB meeting to agree structure Jan 15 th – Face-to-face PMB to agree draft v6 Jan 21 st – CB meeting to discuss v6. Jan 28 th – Submission of draft to Oversight Committee Feb 4 th – Meeting with Oversight Committee + STFC Feb 12 th – Near-final draft incorporating feedback Feb 22 nd – Final comments/typos/corrections done Feb 24 th – Submitted! Mar 4 th – Last possible submission date Apr 15 th - PPRP May 14 th – PPRP Visiting Panel. 12 Weeks 3 Weeks _________ Eyjafjallajökull
Slide PPRP Question-2 David Britton, University of Glasgow 6 Requirements have, and will continue to, evolve. GridPP planning now reflects latest info. Chamonix has made a significant change but financial impact mainly in final year of GridPP3. Additional complication with GridPP3 funding profile requires dove-tailing with GridPP4. LHC Schedule Experiment Computing Models Global Resource Requirements UK Resource Requirements Hardware Costings UK Resource Request Scrutiny by C-RSG (April 2010), CRRB, LHCC (May 2010). UK Collaboration sizes Experience with real data (March+ 2010) Chamonix 2010 (Jan 2010) PPRP 2. What consideration has been given to the impact of the revised LHC schedule announced by CERN following the Chamonix meeting in January 2010? Can more information be provided as to why you require the manpower and the hardware on the proposed schedule and at the level requested with respect to the new LHC schedule?
Slide PPRP Question-3 David Britton, University of Glasgow 7 3. Can you please clearly explain the current situation with regard to cooperation with and support from the other Tier-1s. How does this proposal benefit from collaboration with other Tier-1 centers (for example joint work, best practices etc.)? RAL Tier-1 Tier-1 MB GDB Strategically: Tier-1 benefits through common policies agreed at MB and GDB level (plus working groups). DB, JG. Tactically: Tier-1 benefits by learning of problems/solutions/best-practice via the bi-weekly Tier-1 Service Coordination Meetings and other technical forums such as HEPiX and it’s working sub-groups. Biweekly S. C. M. Daily Ops Meeting Operationally: Tier-1 benefits from immediate feedback via the Daily Operations Meeting. Cooperation and collaboration between Tier-1s also happens at the experiment level via the computing operations teams and by pairing, wherein RAL has special relationships with specific Tier-1s to exchange custodial data. PPRP
Slide PPRP Question-4 David Britton, University of Glasgow 8 4. The Panel would like to understand how the UK is performing in comparison with the wider wLCG effort. Has a performance comparison been done for Tier-1s (globally) and Tier-2s (both nationally and globally)? How was this done (e.g. what performance metrics were used) and how has the output been used? A variety of measures were presented in the written document (thumbnails below). The basic message is that the UK Tier-1 and Tier-2s perform above average to excellent when compared globally. The Tier-2s are compared nationally by wLCG and the experiments and these were used to inform the choices made in the GridPP4 proposal. PPRP
Slide David Britton, University of Glasgow 9 PPRP Questions Please explain how usage of GridPP by the wider community has been considered? What inputs from non LHC experiments have been considered (e.g. T2K, SUPERNEMO)? The wider community were invited to provide written input. Inputs received from BaBar/SuperB, H1, ILC, MINOS, NA62, PhenoGrid, and UKQCD. No input received from MICE, SuperNemo, T2K, or SNO+ … GridPP took this partial input and factored in our observations that (a) resources use by the wider community was unlikely to fall over GridPP4; (b) that 15% of resources had been used by non-LHC experiments in 2009; (c) that 28% of UK particle physicists likely to be doing data-analysis were on non-LHC experiments, to arrive at a minimum reasonable request of 10% of the LHC resources for the wider community. 6. Please clarify why you were unable to arrive at reasonable resource estimates for non-LHC Particle Physics users? Please explain how you arrived at the 10% additional resource requested for non-LHC experiments. PPRP
Slide David Britton, University of Glasgow 10 PPRP Questions What are the targeted benefits to gain from European collaborations, such as EGI, other EU projects/infrastructures? What are the risks and implications in case EGI will not be successful? Targeted benefits of EGI: Strengthen the GridPP Operational Security team. Expand and develop our operational management (GOCDB) and accounting (APEL) of the Grid to broaden opportunities for future support. Expand UK distributed operations support team by levering matching posts. Harmonization of collaborative computing operations in Europe to ensure longevity. Reduce load on wLCG Tier-1s by enabling alternative support of Tier-2s in countries without a Tier-1. Targeted benefits of JISC and GEANT: Deployment and operation of the UK academic network, JANET, and the OPN across Europe (PC and RT on high level committees). Targeted benefits of EMI: Ensuring the wLCG middleware is integrated into the future European strategy. Targeted benefits of wLCG: Fundamental dependence on, and fully integrated partner of, wLCG; represented at the highest levels (GDB, MB, OB). 13. Can you please clearly explain your dependency on other infrastructure and initiatives (e.g. JISC, EGI and wLCG) and how it will be managed should dependencies not be met PPRP
Slide David Britton, University of Glasgow RCUK e-Science Review 11 GridPP4 funded NGS4 and EGI funded EGI not funded NGS4 not funded PPRP Questions
Slide David Britton, University of Glasgow 12 PPRP Question-8 8. There seems to be marked difference between the manpower required for Tier-1 and Tier-2 centres when scaled for comparison against size of operation. Please can you justify? The Panel would like to understand what impact it will have on the UK contribution if the manpower is reduced in those centres where manpower is currently higher than the norm. Clarification on point 8: For the first bit of the question we are referring to Tier2. The question for the second part should read as follows: "The size of operation at the Tier-2 sites varies considerably. The Panel would like to explore the requested manpower for operation and maintenance of the Tier-2 sites and how a reduced effort could be accommodated by pooling expertise at neighbouring sites Comparisons of “size of operation” need to consider the level-of-service delivered and the type of resources, in addition to the capacity. In particular, large disk storage systems; tape robot infrastructure; and large Oracle databases, are all significantly more challenging than CPU. We are confident that the estimate of 26.6 FTE for the Tier-1 is a robust estimate of the effort required, based both on an international survey and on our own experience. To descope the project by 20% the advantages/disadvantages of pooling Tier-2 effort were explored. The Tier-2 roles were distinguished in terms of Group Analysis, Simulation, and User Analysis and manpower optimise to reflect these roles. We believe this is the optimum balance between pooling effort and gaining advantage from a distributed system (local support; leverage of institutional support and resources; mitigation of risk; developing future options). PPRP
Slide David Britton, University of Glasgow 13 PPRP Question-9 9. Recognising the transition of GridPP to a production phase, can you outline what plans /steps you will be taking to seek ongoing efficiency improvements? UK Grid must triple in capacity over lifetime of GridPP4: Efficiency improvements are central to our delivery plan with a flat manpower profile. Three areas: 1)Efficiency improvements in deploying/managing/operating the hardware. This will be achieved by continuing to develop our tools and procedures; by adopting best practice from our international partners; and by identifying and disseminating best practice at the Tier-2 sites in the UK. This will be delivered by actively participating in all the relevant national and international meetings; by monitoring and comparing performance; and by reviewing progress. 2)Increasing the efficiency with which the experiment soft/middle-ware can use the Grid infrastructure. This will be delivered by the posts requested in WP-D. 3)Increasing the efficiency with which data is handled at all levels from the basic i/o to worker-nodes up to the handling of file transfers and metadata. This will be delivered by the data management posts in WP-C. PPRP
Slide David Britton, University of Glasgow PPRP 14 PPRP Question Can you provide more information about the hardware experts that you are hiring? Only limited new hiring due to redistribution of Tier-2 posts. Roles are described in Appendix- A of proposal. Requires much more than hardware expertise at both the Tier-1 and the Tier-2. At the latter, the roles are particularly multifaceted papers: 22 papers,105 (~70 unique) authors Maintain a list of publications (best-effort basis, so incomplete) on the GridPP website at: Currently 220 publications (2001 – 2009).
Slide David Britton, University of Glasgow PPRP 15 PPRP Question We understand that the request of 3.5k of travel per year, per person, is based on GridPP3 experience; however, given the current funding climate, is it really essential to allocate travel at this level and have you considered reducing travel costs by increasing the use of video/audio conferencing tools? GridPP works in an international collaboration (wLCG) but does not have staff abroad – some travel is required and might expect this to be commensurate with the (non LTA) travel of the experiments. In practice, GridPP manages travel at it’s current level by requesting co-funding from experiments for some trips. This dual-key approach ensures engagement with the experiments in a relevant way. GridPP uses video/audio conferencing extensively and has significant expertise and experience in this area. Our website contains recommendations and the UK contributed significantly to the LHC report on collaborative tools. However, there are times when national and international travel cannot be replaced by alternatives. To do so would reduce the influence and effectiveness of GridPP, limit engagement and compromise technical progress. We believe the proposal as submitted requests a reasonable and responsible travel budget that we believe is necessary for the functioning of the project.
Slide PPRP Question-12 David Britton, University of Glasgow 16 The collective investment from the Tier-2 institutes in GridPP is extremely large. Although a bottom-up estimate on an institute-by-institute basis is not possible, a top-down order-of- magnitude estimate of contributions in is as follows: Capital Costs (machine rooms etc): £10.7m (extrapolated from specific examples) Hardware (above that funded by GridPP): £3.3m (from looking at resources delivered to EGEE) Electricity costs: £2.5m (based on average power costs) Manpower not funded by GridPP: £1.9m (based on GridPP quarterly reports) Total non-GridPP Tier-2 Investment: £18.4m This compares with GridPP investment of: £9.7m (staff and hardware) This investment has come through multiple paths including JIF, SRIF, HEFCE, SFC, Regional- Development Grants, etc. We believe all institutes have contributed to infrastructure and operational costs. This is a significant (and probably unique) contribution to the costs of the project, demonstrating the involvement of the institutions, and for which we are very grateful. PPRP 12. Tier-2 contributions by some host institutes are not high proportionally. Please provide details of how you intend to involve UK Universities in the Tier-2 centre investment and increase contributions to infrastructure and operation. What is the timeframe for this involvement?
Slide PPRP Question - 14 David Britton, University of Glasgow PPRP Can you expand on how you intend to promote Technology or Knowledge Transfer? For example, with regard to the exploitation of middleware/security capability ? Build on the recognised successes of GridPP3 – formalising the current ad-hoc process by introducing a steering group with members from within and beyond GridPP. This would help a more targeted approach to complement the current opportunistic and reactive environment. The KT activities would be better linked to external bodies (E.g. Digital Systems KTN, Scalable Computing, NGS/NGI/EGI, Impact QM!, etc) with a more structured approach. In the middleware/security area, there are two jewels preserved in the GridPP4 proposal: GANGA (WP-D posts), which has been taken up widely and has the potential to attract new interest; and the GridSite security toolkit (WP-C post) that is embedded in the gLite middleware and also has some uptake as a website-construction toolkit. GridPP has also led the development of security policy for wLCG and EGEE and this has applicability to many Grids.
Slide Descoping David Britton, University of Glasgow 18 The GridPP4 proposal as submitted had already undergone an extensive process of reduction to assimilate cuts of 20% required in December During that process, all the potential options for descoping were explored and all investments were prioritised based on our past experience and our understanding of the future requirements. The submitted proposal was carefully balanced and left no scope for further reduction by simply prioritising one work package over another; there are no optional extras included in the bid. We have attempted to respond to the PPRP question on further reductions, at two levels: In order to go from a 20% to a 25% reduction, we have provided a detailed list of additional items that we would consider removing in order to save about £1.5m. For the other scenarios, we give examples of areas where the scope of the project might be reduced but presume this would require consideration by PPAN as this is essentially a science prioritisation exercise. PPRP
Slide £1.5m Reduction David Britton, University of Glasgow 19 Hardware re-planning - depends on GridPP3: £330k Reduced travel (WP-G) – This will reduce operational efficiency, experiment engagement, international influence, and impede technical progress: £150k Reduced project management (in WP-E) – Would not re-appoint retiring deputy project leader in Some risk that this reduces our potential international impact and removes some high-level oversight at the Tier-1: £140k Reduced Impact/KE activity (WP-F) - We will not be able to respond to the increased emphasis on Impact: £147k Reduced support for non-LHC experiments (WP-D) – Community will not be able to fully capitalize on the investment in a Grid infrastructure: £180k The 20% cuts have already reduced effort to a critical level in WP-A and B. This was addressed by the W.A. It makes no sense to cut effort in WP-A and –B further whilst preserving the W.A: Remove W.A: £491k Total Reduction: £1.44m. Project would now have been reduced by 25% but tries to preserve the core-mandate to deliver a computing Grid to the LHC experiments in an internationally competitive way. PPRP
Slide Working Allowance and Contingency David Britton, University of Glasgow 20 2 years of effort to address Risk-1 (CASTOR) and Risk-6 (Tier-1 Service Level).Hardware costings Risk- 11,19, 20 (Tier-1) Risk- 14 and 21 (Tier-2s) Risk-18 (both). 4 years of effort to address Risk-9 (Tier-2 Service Level and EGI end). 4 years of effort to address Risk-15 (EGI/NGI transition) and Risk-22 (EGI funding) 3 years of effort to address Risk-12 (NGS4) PPRP
Slide Larger Reductions David Britton, University of Glasgow 21 To make larger reductions, the project scope would need to be redefined. It was assumed this would have to be done at the PPAN level. Two possible scenarios were considered: A)Remove all support for non-LHC experiments: 10% of Tier-1 hardware:£580k 0.5 FTE at Tier-1:£170k 10% of Tier-2 hardware:£250k 0.5 FTE at Tier-2:£160k 0.5 – 1.0 FTE of support:£180k - £350k Total Reduction:up to £1.5m B)Reduce support for LHC experiments (example scenario): Remove ALICE support for 11/12£100k 20% reduction in ATLAS group analysis (1.5 FTE)£475k 20% reduction in CMS group analysis (0.75 FTE)£237k Comparable reduction in LHCb activities£158k Reduction in Tier-2 hardware£300k Reduction in data-support (1 FTE)£325k Total Reduction:£1.6m PPRP
Slide PPAN Feedback David Britton, University of Glasgow GridPP25 22 STFC Council PPAN PPRP Science Board PPAN feedback received August 11 th : “The GridPP proposal was considered by the Particle Physics, Astronomy and Nuclear Physics Science Committee (PPAN) at the meeting held on 20 July PPAN has recommended support for the proposal. This is at a reduced level to the original request, but broadly in line with the advice received from the Projects Peer Review Panel (PPRP).” “However, while agreed in principle, STFC is unfortunately not able to make the recommended commitment in full at this time due to the funding uncertainties arising from the challenging CSR 2010 exercise now underway. Consequently, it has been decided to make an interim award, pending the CSR outcome” Which means some, but not all, of the money for the first two years.
Slide What is the recommendation? David Britton, University of Glasgow GridPP25 23 There are two types of money – Capital (most of the hardware) and Resource (everything else). The balance of these has been fixed by PPAN, which is an additional constraint. PPAN’s recommendation is basically our first (£1.5m) reduction scenario plus a hybrid of the two additional scenarios we proposed for larger reductions. There is reduced support (but not zero) for non-LHC experiments. There is a 10% reduction (not 20%) in the group analysis support for the LHC experiments.
Slide Implementation David Britton, University of Glasgow GridPP25 24 In some instances the PPAN/PPRP feedback has explicitly cut posts and/or roles. Group leaders have received this information. Implementation of the other cuts is being/will be discussed with the Experiments and Institutes concerned guided by strategic decisions at Monday’s PMB meeting. Following all discussions, a revised GridPP4 plan will be prepared in early September; ratified by the CB; and presented to STFC for final approval. It is hoped that the uncertainty generated by the CSR will be resolved before GridPP4 commences. We should not lose sight of the fact that it is a major success to secure this level of funding at this difficult time. Although things will be challenging, I believe we can deliver the Grid that is required.
Slide Top 10 Challenges David Britton, University of Glasgow 1GridPP Funding 2 Manpower 4 Data Storage 3 Data Management 5 Evolving computing models 6 Hardware management 7 Hardware provision 8 EGI/NGI 10The unexpected 9 Security 11 Moving Roger to the A-Team by 2011