Presentation on theme: "M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #1 Simon Fraser Status of the WLCG Tier-2 Centres M.C. Vetterli Simon Fraser University and TRIUMF WLCG."— Presentation transcript:
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #1 Simon Fraser Status of the WLCG Tier-2 Centres M.C. Vetterli Simon Fraser University and TRIUMF WLCG Overview Board, CERN, October 27 th 2008
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #2 Simon Fraser Sources of Information Discussions with experiment representatives in July APEL monitoring portal http://www3.egee.cesga.es/gridsite/accounting/CESGA/egee_view.php WLCG reliability reports http://lcg.web.cern.ch/LCG/accounts.htm October GDB mtg; dedicated to Tier-2 issues http://indico.cern.ch/conferenceDisplay.py?confId=20234 Talks from the last OB & LHCC Slides labeled with a * are from MV’s LHCC rapporteur talk
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #3 Simon Fraser Tier-2 Performance Summary* Overall, the Tier-2s are contributing much more now Significant fractions of the Monte Carlo simulations are being done in the T2s for all experiments Reliability is better, but still needs to improve CCRC’08 exercise is generally considered a success for the Tier2s
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #4 Simon Fraser Overall, the Tier-2s and the experiments considered the CCRC’08 exercise to be a success The networking/data transfers were tested extensively; some FTS tuning was needed, but it worked out Experiments tended to continue other activities in parallel which is a good test of the system, although the load was not as high as anticipated While CMS did include significant user analysis activities, the chaotic use of the Grid by a large number of inexperienced people is still to be tested Tier-2 Centres in CCRC’08 – General*
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #5 Simon Fraser Tier-2 Issues/Concerns As of CB and meetings with experiments this summer Communications: Do Tier-2s have a voice? Is there a good mechanism for disseminating information? Better monitoring: Pledges vs actual vs used Hardware acquisitions: What should be bought? kSI2006? Tier-2 capacity : Size of datasets? Effect of LHC delay? …
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #6 Simon Fraser Tier-2 Issues/Concerns Upcoming onslaught of users: Some user analysis tests have been done but scaling is a concern User Support: Ticketing system exists but it is not really used for user support issues. This affects Tier-2s especially. Federated Tier-2s: Tools to federate? Monitoring? (averaging) Interoperability of EGEE, OSG, and NDGF should be improved Software/Middleware updates: Could be smoother; too frequent
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #7 Simon Fraser Communications for Tier-2s Identified by the T2s at the last CB as a serious problem. Interesting to me that many in experiment computing management did not share this concern. Should communication be organized according to experiment or to Tier-1 association? There are also differing opinions on this. There are two issues: Grid middleware/operations Experiment software My view after studying this is that the situation is OK for “tightly coupled” Tier-2s, but not for remote and smaller Tier-2s that are not well coupled to a Tier-1.
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #8 Simon Fraser Communications for Tier-2s Many lines of communication do indeed exist. Some examples are: CMS has two Tier-2 coordinators: Ken Bloom (Nebraska) Giuseppe Bagliesi (INFN) - attend all operations meetings - feed T2 issues back to the operations group - write T2-relevant minutes - organize T2 workshops ALICE has designated 1 Core Offline person in 3 to have privileged contact with a given T2 site manager - weekly coordination meetings - Tier-2 federations provide a single contact person - A Tier-2 coordinates with its regional Tier-1
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #9 Simon Fraser Communications for Tier-2s ATLAS uses its cloud structure for communications - Every Tier-2 is coupled to a Tier-1 - 5 national clouds; others have foreign members (e.g. “Germany” includes Krakow, Prague, Switzerland; Netherlands includes Russia, Israel, Turkey) - Each cloud has a Tier-2 coordinator Regional organizations, such as: + France Tier-2/3 technical group: - coordinates with Tier-1 and with experiments - monthly meetings - coordinates procurement and site management + GRIF: Tier-2 federation of 5 labs around Paris + Canada: Weekly teleconferences of technical personnel (T1 & T2) to share information and prepare for upgrades, large production, etc. + Many others exist; e.g. in the US and the UK
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #10 Simon Fraser Communications for Tier-2s Tier-2 Overview Board reps: Michel Jouvin and Atul Gurtu have just been appointed to the OB to give the Tier-2s a voice there. Tier-2 mailing list: Actually exists and is being reviewed for completeness & accuracy Tier-2 GDB: The October GDB was dedicated to Tier-2 issues + reports from experiments: role of the T2s; communications + talks on regional organizations + discussion of accounting + technical talks on storage, batch systems, middleware Seems to have been a success; repeat a couple of times per year?
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #11 Simon Fraser
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #12 Simon Fraser
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #13 Simon Fraser But how much of this is a problem of under-use rather than under-contribution? a task force has been set up to extract installed capacities from the Glue schema Monthly APEL reports still undergo significant modifications from first draft. Good because communication with T2s better Bad because APEL accounting still has problems Accounting seems to be very finicky; breaks when the CE or MON box is upgraded How are jobs distributed to the Tier-2s? Tier-2 Installed Resources
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #14 Simon Fraser How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU capacity and we have seen this for storage as well Tier-2 Hardware Questions
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #15 Simon Fraser How does the LHC delay affect the requirements and pledges for 2009? + We are told to go ahead and buy what was planned but we have already seen some under-use of CPU and we are now starting to see this for storage as well We need to use something other than SpecInt2000! + this benchmark is totally out-of-date & useless for new CPUs + continued delays in SpecHEP can cause sub-optimal decisions Tier-2 Hardware Questions
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #16 Simon Fraser Networking to the nodes is now an issue. + with 8 cores per node, 1 GigE connection ≈ 16.8 MB/sec/core + Tier-2 analysis jobs run on reduced data sets and can do rather simple operations have seen 7.5 MB/sec at ATLAS and much more (x10?) + Do we need to go to Infiniband? + We certainly need increased capability for the uplinks; we should have a minimum of fully non-blocking GigE the worker nodes. We need more guidance from the experiments The next round of purchases is now! Tier-2 Hardware Questions
M.C. Vetterli – WLCG-OB, CERN; October 27, 2008 – #17 Simon Fraser Summary The role of the Tier-2 centres has increased markedly in the last year >50% of Monte Carlo simulation is done in the T2s now. The CCRC’08 exercise is considered a success by the Tier2s and by the experiments. Availability and reliability are up, but still need improvement. Resource acquisition vs pledges is better but still needs work Issues for Tier2s: - communication should be (& is being) improved - work should ramp up on chaotic user analysis - reporting actual resources should be established - improved user support is needed
Your consent to our cookies if you continue to use this website.