Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Performance and Exception Monitoring Project Tim Smith IT/PDP.

Similar presentations


Presentation on theme: "The Performance and Exception Monitoring Project Tim Smith IT/PDP."— Presentation transcript:

1 The Performance and Exception Monitoring Project Tim Smith IT/PDP

2 2000/03/21Tim Smith: FNAL workshop2 Contents Requirements –current systems inadequacies –Views + global metrics –GQM + correlations Framework –Scalabilty issues Project Status –Tools survey Details from Alessandro…

3 2000/03/21Tim Smith: FNAL workshop3 Current systems inadequacies Independent alarm/monitoring systems –System snapshot requires multiple displays Independent agents which: monitor local / monitor remote / restart /alarm –Calculate same info multiply and use differently Host based – no correlations –Hosts complain about perceived problem not real one Operator only follows precise instructions –Automation! (+ manual Remedy entry) Separate static config DBs for alarms and machines

4 2000/03/21Tim Smith: FNAL workshop4 Visions of the Future One tool, many purposes…Views: –End-to-end, user, sysadmin, resource planning 1000’s of PCs per cluster –Living with failures + scalable solutions! Assure a service;Quorum of machines NOTfull complement High level correlations; impact on a service Quality of Service measures; Global Metrics

5 2000/03/21Tim Smith: FNAL workshop5 Global Metrics Honour Service Definitions “Availability of usable 3000 CUs batch” –Machines up + FATMEN + LSF + lic. Serv. “Availability of an interactive facility” –ASIS available + low trivial response time “Job turnaround time expectations” “Time to service tape request” +Disk/Network bandwidths +CPU/Memory utilisations

6 2000/03/21Tim Smith: FNAL workshop6 Goal / Question / Metric PDP Services e.g. Monitor quality of Interactive Service –Sufficient nodes? –Low enough load? –Slow to respond to commands? –Contactable via network Network daemons alive No nologin Free ptys

7 2000/03/21Tim Smith: FNAL workshop7 Correlations Examples: –Web server on “SUN cluster” –Interactive Service Client 1MV1, MV1, MV1 Client 2MV2, MV2, MV2 Correlation ServerMV3, MV3, MV3

8 2000/03/21Tim Smith: FNAL workshop8 Framework Diagram

9 2000/03/21Tim Smith: FNAL workshop9 Scalability Avoid bottlenecks by allowing for multiplicity of all components Guiding principle: to avoid the PEM design being constrained by “possible” performance worries

10 2000/03/21Tim Smith: FNAL workshop10 Project Status Approval as divisional project –Interest in EFF and GRID projects Documents Produced: –User Requirements –Tools survey –Goal / Question / Metric Analysis (end April) Design (end May) http://cern.ch/proj-pem > Progress > Analysis

11 2000/03/21Tim Smith: FNAL workshop11 Tools Survey Enterprise / Cluster Management –Tivoli, UnicenterTNG, Patrol, PCP, SCADA, Alinka, SCMS, MosixMON Public Domain Tools –MAT, GAP, Ranger (SLAC), VAMOS (DESY), rls (IN2P3) Building blocks –SNMP (Scotty, Advent, MRTG, UCD), JDMK –PIKT, NetLogger, bonobo


Download ppt "The Performance and Exception Monitoring Project Tim Smith IT/PDP."

Similar presentations


Ads by Google