Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya.

Similar presentations


Presentation on theme: "1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya."— Presentation transcript:

1 1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya Carrillo

2 2 Outline Manifesto Monitoring Web interface Internals Accounting Web interface Internals Outlook URLs

3 3 Manifesto Monitoring and Accounting are tasks in DIRAC 377 DIRAC is a Production grid for LHCb The Monitoring reports the status of jobs while in the WMS (Workload Management System) 366 Instantaneous snapshot of the system No historic records The Accounting records the status of jobs after leaving the WMS Provides historic record, accumulated statistics and evolution of recorded variables with time Main users: production and site managers

4 4 Design choices Monitoring Job information stored centrally in the WMS Info Provided directly by the job and the WMS push Passive services: no push ing of information No need for a common consumer API Job and Application state stored together Accounting Separate infrastructure from the monitoring Jobs can never be on the Accounting and the Monitoring Domain specific: LHCb production jobs

5 5 Information Flow WMS Web interface Job Database Accounting Database Cleaner Agent Accounting WriteRead Monitoring ReadWrite Job Users Backend Services & Agents Job Heart-beat DIRAC

6 6 Monitoring Web Interface 1 Interface to query monitoring service JobId popup a window with job details if clicked

7 7 Monitoring Web Interface 2 The overview shows predefined plots on the production Generated every few minutes PyChart PyChart used as graphics engine 100% python Supports SVG Running jobs by site

8 8 Monitoring Web Interface 3 Job status by site and production id

9 9 Monitoring Internals It consists of a XML-RPC service exposing whatever parameters are known to DIRAC Job parameters stored internally by DIRAC Primary parameters Execution site, job status, job owner etc. Fixed, centrally defined: fast access Can query on them Secondary parameters Number of steps, internal job state, etc Defined by the production job itself Stored as key-value pairs Slower access. Cannot query on them

10 10 JMS basic API example from xmlrpclib import ServerProxy server = ServerProxy(monitoring_url) #Retrieve list of jobs verifying some conditions conditions = {'Status': 'running', 'Site': 'DIRAC.CERN.ch' } jobreq = server.getJobs(conditions) #Print some parameters for each job if jobreq['Status']: for jobid in jobreq['Value']: print server.getJobSite(jobid) print server.getJobParameter(jobid, 'LocalBatchId') #Bulk operations sum = server.getJobsPrimarySummary(jobreq['Value']) ~3 s to select 95 out of 50k jobs ~0.7 s ~40 s

11 11 Accounting Web Interface 1 GUI for querying the Accounting Shows results As graphics As table As Excel sheet Several types of report Only a few shown here

12 12 Accounting Web Interface 2 Used resources by site

13 13 Accounting Web Interface 3 Used resources by event type Mb/job CPU/job Failed jobs CPU vs. Exec time Input and Output data vs. CPU

14 14 Accounting Web Interface 4 Produced data by production ID Rates Cumulative Number of events Gb of output

15 15 Accounting Web Interface 5 WMS statistics on DIRAC's performance Plots Job execution time vs. WMS waiting time Job execution time vs. WMS matching time Granularity Per site Per production Integral Allows assessment of DIRAC's performance

16 16 Accounting Internals Job and DIRAC statistics kept in a database Site contribution Data produced and used by jobs and steps Timing for jobs, steps and DIRAC internals Separate XML-RPC interfaces to populate and query the accounting tables Both interfaces have restricted access Jobs are moved to the accounting system by a cleaner agent after being validated

17 17 Accounting Usage About 10 hits per day Time to generate daily static reports: 8 min 60-70% of the time querying the database 30-40% of the time in the drawing package Server load<0.2 Total: 169 kjobs

18 18 Outlook Monitoring page Transactions in monitoring updates Further optimisation (bulk operations...) Search for a faster rendering package Make the web page dynamic: Less reloads Accounting New report types Normalized CPU Contribution by country Rate by site, country etc...

19 19 URLs Monitoring page http://fpegaes1.usc.es/dmon/DC04/joblist.html Mirror on: http://lhcb02.usc.cesga.es/dmon/DC04/joblist.html Direct link to overview pages http://lhcb.ecm.ub.es/DC04/Monitoring Accounting page http://lhcb.ecm.ub.es/DC04/Accounting/


Download ppt "1 A lightweight Monitoring and Accounting system for LHCb DC'04 production V. Garonne R. Graciani Díaz J. J. Saborido Silva M. Sánchez García R. Vizcaya."

Similar presentations


Ads by Google