Download presentation
Presentation is loading. Please wait.
Published byElinor O’Neal’ Modified over 9 years ago
1
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx development team Computing and Offline Monitoring Workshop 11/05/2011
2
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 2Nicolò Magini – CMS Offline & Computing Monitoring Workshop11/5/2011 Outline PhEDEx monitoring PhEDEx webpage & datasvc PhEDEx plots PhEDEx shift monitoring PhEDEx latency monitoring PhEDEx agent monitoring PhEDEx storage monitoring NOTE: This will be a quick summary, for more details see talk at O&C week https://indico.cern.ch/materialDisplay.py?contribId=9&sessionId=21&materialId=slides&confId=132001
3
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 3Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx monitoring Extensive status and historical tables –Link events e.g. number/size of files (un)successfully transferred –Used to calculate rate/quality –Node and Link queue stats e.g. number/size of files requested/queued for transfer –Used by shifters to monitor queue health Populated by relevant agents in 5 min bins Regularly compacted to 1h bins Worked well so far, could require further optimization in the future (partitioning) 11/5/2011
4
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 4Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx Datasvc –PhEDEx Datasvc will be the unique service to access info in PhEDEx DB –For PhEDEx web –For external monitoring tools –For your own PhEDEx monitoring tool! –https://cmsweb.cern.ch/phedex/datasvc/dochttps://cmsweb.cern.ch/phedex/datasvc/doc –Main areas of work in 2011 –Performance –Validation of writable APIs –Consistency of arguments and output –Adding new APIs as requested/needed –SlowFiles, SlowSubscriptions, DataTypeUsage… 11/5/2011
5
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 5Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx webpage Existing webpage not mantainable Single file of 10000 lines of perl code Next-gen prototype not widely used Unfamiliar, missing functionality Gradually replace pages in old webpage with next-gen modules using datasvc as backend First example: new request panel Upcoming: subscriptions page Eventually with shopping cart Rest over the course of 2011 https://cmsweb.cern.ch/phedex 11/5/2011
6
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 6Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx monitoring plots Well known PhEDEx plots –GraphTool utility –Recent round of security fixes –No maintenance or development planned 11/5/2011
7
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 7Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx monitoring plots Porting to Overview/Plotfairy framework Note: Plotfairy backend support independent from maintenance of Overview page Working to complete by this summer Other options explored e.g. protovis 11/5/2011
8
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 8Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx shift monitoring First next-gen monitoring panel for shifters available since a few months Others will be added in next months Other specialized monitoring panels already provided in next-gen prototype - but not widely used 11/5/2011
9
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 9Nicolò Magini – CMS Offline & Computing Monitoring Workshop Block latency monitoring Currently based on direct SQL queries and logfile parsing tools Table to collect block latency statistics deployed in PhEDEx 3 years ago Records block completion events –Block created, subscribed, first file routed, first file arrived, last file arrived –Keeps track of “system latency” vs. “human latency” (i.e. subscription suspended) Only partially debugged e.g. multiple latency entries for the same transfer e.g. blocks with negative latency 11/5/2011
10
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 10Nicolò Magini – CMS Offline & Computing Monitoring Workshop Block latency monitoring Latency monitor schema/agents Debugging/understanding content of current table Will extend schema to record more events e.g. 25%/50%/75%/95% block completion mark In progress, should be on Testbed by end of the month Latency visualisation – in Summer Datasvc API Latency plots –To explore: Publish per-file latency stats from FilePump logs 11/5/2011
11
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 11Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx agent monitoring PHEDEX_4_0_0 includes improvements for site agent health monitoring Information from all site agents is collected by local watchdog agent Watchdog now produces a daily report on agent activity Agent alerts, agent CPU/mem usage, etc. Report content can be customized with site-specific plugin Watchdog report can then be notified to site admins with various methods Could be also collected centrally for shifter monitoring, complementing the Agent Status webpage 11/5/2011
12
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 12Nicolò Magini – CMS Offline & Computing Monitoring Workshop PhEDEx storage monitoring PhEDEx Namespace Framework for efficient interaction with local storage Caching, storage dumps, directories… Currently used by BlockDownloadVerify agent Could also be more widely used by other scripts or local tools e.g. FileDownloadVerify scripts Evaluating also use of Namespace to generate space accounting reports of local storage Including storage areas not in PhEDEx e.g. /store/user 11/5/2011
13
CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t ES 13Nicolò Magini – CMS Offline & Computing Monitoring Workshop Summary Datasvc framework for providing information from TMDB. Any operator SQL can (should!) become an API Website take the good from next-gen prototype lesson: richer navigation, filtering and presentation Watchdog agent Report summaries and alerts, as desired by the user Namespace framework Generic, lightweight framework for SE interaction, can be used by sites for all sorts of SE tools 11/5/2011
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.