Multicore Accounting John Gordon, STFC-RAL WLCG MB, July 2015.

Slides:



Advertisements
Similar presentations
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Advertisements

CREAM John Gordon GDB November CREAM number of sites now – gstat2 says 24. Batch systems supported Experiment Tests Feedback from sites. Evaluation.
UK NGI Operations John Gordon 10 th January 2012.
Storage Accounting John Gordon, STFC GDB June 2012.
Status of WLCG Tier-0 Maite Barroso, CERN-IT With input from T0 service managers Grid Deployment Board 9 April Apr-2014 Maite Barroso Lopez (at)
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI GPGPU Accounting John Gordon STFC 09/04/2013 EGI CF – Accounting and Billing1.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) GridPP35, Liverpool 11 Sep 2015.
Monitoring the Grid at local, national, and Global levels Pete Gronbech GridPP Project Manager ACAT - Brunel Sept 2011.
LCG Introduction John Gordon, SFTC GDB December 2 nd 2009.
ARC Accounting John Gordon. Limitations Resilience – Religious objection to using the BDII for service discovery so only one message broker is hardcoded.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
SL6 Status at Oxford. Status  SL6 EMI-3 CREAMCE  SL6 EMI3 WN and gLExec  Small test cluster with three WN’s  Configured using Puppet and Cobbler 
WLCG Collaboration Workshop 7 – 9 July, Imperial College, London In Collaboration With GridPP Workshop Outline, Registration, Accommodation, Social Events.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
Your university or experiment logo here The European Landscape John Gordon GridPP24 RHUL 15 th April 2010.
LCG Accounting John Gordon Grid Deployment Board 13 th January 2004.
Accounting Update Stuart Pullinger, STFC Scientific Computing Department, APEL Team GDB 10 th December 2014.
Storage Accounting John Gordon, STFC GDB March 2013.
EMI INFSO-RI Accounting John Gordon (STFC) APEL PT Leader.
Procedure for proposed new Tier 1 sites Ian Bird WLCG Overview Board CERN, 9 th March 2012.
Accounting Update John Gordon and Stuart Pullinger January 2014 GDB.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks John Gordon SA1 Face to Face CERN, June.
Accounting non-Grid Use John Gordon Management Board 7/6/2007.
APEL Cloud Accounting Status and Plans APEL Team John Gordon.
Accounting For Multicore Jobs John Gordon, STFC, UK Scientific Computing Department, APEL Team MB 17 th March 2015.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
LCG Introduction John Gordon, STFC GDB July8th 2009.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
LCG User Level Accounting John Gordon CCLRC-RAL LCG Grid Deployment Board October 2006.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE4004 ATLAS CMS LHCb Totals
LCG Accounting/Reporting John Gordon, STFC MB November 9 th 2011.
WLCG Information System Use Cases Review WLCG Operations Coordination Meeting 18 th June 2015 Maria Alandes IT/SDC.
SL5 Site Status GDB, September 2009 John Gordon. LCG SL5 Site Status ASGC T1 - will be finished before mid September. Actually the OS migration process.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI EGI-InSPIRE APEL for Accounting John Gordon, Stuart Pullinger STFC.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
RI EGI-InSPIRE RI UMD 2 Decommissioning Status Cristina Aiftimiei EGI.eu.
Next Steps after WLCG workshop Information System Task Force 11 th February
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
LCG Pilot Jobs + glexec John Gordon, STFC-RAL GDB 7 December 2007.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Accounting John Gordon WLC Workshop 2016, Lisbon.
News from the HEPiX IPv6 Working Group David Kelsey (STFC-RAL) HEPIX, BNL 13 Oct 2015.
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
WLCG Accounting Task Force Update Julia Andreeva CERN GDB, 8 th of June,
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Storage Accounting John Gordon, STFC OMB August 2013.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Accounting Requirements Stuart Pullinger STFC 09/04/2013 EGI CF – Accounting.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
John Gordon EMI TF and EGI CF March 2012 Accounting Workshop.
Accounting Update John Gordon. Outline Multicore CPU Accounting Developments Cloud Accounting Storage Accounting Miscellaneous.
Storage Accounting John Gordon STFC GDB, Lyon 6 th April2011 GDB January 2012.
CMS Multicore jobs at RAL Andrew Lahiff, RAL WLCG Multicore TF Meeting 1 st July 2014.
Accounting Review Summary and action list from the (pre)GDB Julia Andreeva CERN-IT WLCG MB 19th April
Multicore Accounting John Gordon, STFC-RAL WLCG Operations Coordination, 1 st October 2015.
HTCondor Accounting Update
HTCondor Accounting Update
Communication, Communication, Communication
EGI Operations Management Board
John Gordon STFC OMB 26 July 2011
The New APEL Client Will Rogers, STFC.
Benchmarking Changes and Accounting
Update on Plan for KISTI-GSDC
Raw Wallclock in APEL John Gordon, STFC-RAL
Accounting Requirements
APEL Storage Accounting
John Gordon (STFC) APEL PT Leader
Peter Solagna – EGI Foundation
New Types of Accounting Beyond CPU
UMD 2 Decommissioning Status
User Accounting Integration Spreading the Net.
Presentation transcript:

Multicore Accounting John Gordon, STFC-RAL WLCG MB, July 2015

Outline History Recent Progress Current Status Plan John Gordon, MB July2015

History I first raised the issue of sites publishing details of the cores used per job at the OMB last December with an update in January. There was some initial improvement but then progress flattened off. WLCG are now running many more multicore jobs and wish to see this reflected in accounting. Knowing the number of cores used is important in calculating the effective wallclock time and thus the overall occupancy of a cluster. John Gordon, MB July 2015

Recent Progress At the June meeting of the WLCG Grid Deployment Board I reported that 87% of LHC CPU use was now reported as coming from Sites/CEs which reported the number of cores per job. Since there were some obvious omissions from important sites and countries I was asked to address this. I raised tickets against all NGIs and gave them a link to the publishing of cores for June by their sites which run LHC work. This has mainly been successful. By the end of June we had 95% publishing This excluded sites which didn't run any LHC work in June In July I extended the campaign to include EGI sites not running any LHC work. John Gordon, MB July 2015

Current Status There are still about 60 sites who have published jobs without cores in the last few days but there is a long tail of failed jobs and rogue CEs that don’t amount to significant CPU use. There is a smaller number of sites with some or all CEs not reporting cores. There are few with problems not under their control Many have never responded to tickets. John Gordon, MB July 2015

In July 98% of LHC CPU reports cores. (Underestimate) Insert footer here

Sites Publishing zero core jobs since 13/7/15 SiteCountryLHC?JobsHS06Hours Hephy-ViennaAustria (NGI-IT)CMS ,927,676 RWTH-AachenGermanyCMS ,113,916 MPPMUGermanyATLAS ,079,792 DESY-HHGermanyCMS ,035,869 RRC-KIRussiaAlice ,788,016 IN-DAE-VECC-02IndiaAlice ,309,750 IFIC-LCG2SpainATLAS ,176,147 ICN-UNAMMexicoAlice ,787 Ru-Troitsk-INR-LCG2RussiaAlice, CMS, LHCb ,022 CBPFBrazilNo ,991 Kharkov-KIPT-LCG2UkraineCMS ,430 ru-Moscow-SINP-LCG2RussiaCMS ,733 HEPHY-UIBKAustria (NGI-IT)ATLAS ,471 Insert footer here #jobs and CPU for July

Issues DESY-HH publish from CREAM but not ARC-CE. – Outstanding ticket with ARC team – Suspect MPPMU have the same issue although no response from them. RWTH-Aachen – no response Austrian sites are part of Italian NGI but were not alerted. – Acknowledged but no action in a week. Spanish site publishing from multicore queues but admitted they hadn’t changed single core queue. – No further response in over two weeks. Russia (4), India, Mexico, Ukraine – no response Insert footer here

Country View John Gordon, MB July 2015 WLCG View

Within a Site John Gordon, MB June 2015

Summary Solution for APEL is simple Set parallel=true in parser.cfg for parsers parsing batch logs For other accounting systems, ask advice. Just start publishing from now, don’t backdate. I need help with recalcitrant sites. John Gordon, MB July 2015