- GMA Athena (24mar03 - CHEP La Jolla, CA) GMA Instrumentation of the Athena Framework using NetLogger Dan Gunter, Wim Lavrijsen, David Quarrie, Brian Tierney, Craig Tull HCG/NERSC/LBNL CHEP 2003 La Jolla, CA - March 24, 2003
- GMA Athena (24mar03 - CHEP La Jolla, CA) The Problem The Atlas Athena Framework has a large number of components When running in a Grid environment, and something goes wrong (e.g.: the job runs slower than expected or crashes) it is very difficult to determine which component is at fault Constant, verbose logging generates too much information Solution: We are using NetLogger and pyGMA to instrument and monitor Athena
- GMA Athena (24mar03 - CHEP La Jolla, CA) Athena/GAUDI Architecture Converter Algorithm Event Data Service Persistency Service Data Files Algorithm Transient Event Store Detec. Data Service Persistency Service Data Files Transient Detector Store Message Service JobOptions Service Particle Prop. Service Other Services Histogram Service Persistency Service Data Files Transient Histogram Store Application Manager Converter
- GMA Athena (24mar03 - CHEP La Jolla, CA) Grid Testbed Topologies (2002) EDG Testbed (star) US ATLAS (mesh) NorduGrid (mesh)
- GMA Athena (24mar03 - CHEP La Jolla, CA) Review: Grid Monitoring Architecture (GMA): Terminology and Architecture (Performance) Event: —Typed collection of data with a specific structure Producer Interface: —makes performance data (events) available Consumer Interface: —receives performance data (events) Directory Service: —supports information publication and discovery —must be distributed and/or replicated
- GMA Athena (24mar03 - CHEP La Jolla, CA) Athena Distributed Instrumentation Part of SuperComputing 2002 ATLAS demo IGMASvc IMonitorSvc extension? —Abstract application monitoring service. NetLogger ( —End-to-End Monitoring & Analysis of Distributed Systems —C, C++, Java, Python, Perl, Tcl APIs —Web Service Activation Prophesy ( —An Infrastructure for Analyzing & Modeling the Performance of Parallel & Distributed Applications —Normally a Parse & auto-instrument approach (C & FORTRAN).
- GMA Athena (24mar03 - CHEP La Jolla, CA) DIDC Technologies Used LBNL's Data Intensive Distributed Computing Group NetLogger provides —Easy to use instrumentation library —Ability to correlate data from varies sources based on time —Easy way to collect data from multiple clients/servers reliably —Visualization and analysis tools pyGMA provides —Easy to use producer and consumer python library for constructing GGF-defined GMA services Activation Service provides —Ability to remotely trigger and collect monitoring data in running Grid applications
- GMA Athena (24mar03 - CHEP La Jolla, CA) NetLogger Toolkit DIDC have developed the NetLogger Toolkit (short for Networked Application Logger), which includes: —tools to make it easy for distributed applications to log interesting events at every critical point NetLogger client library (C, C++, Java, Perl, Python) —tools for host and network monitoring —event visualization tools that allow one to correlate application events with host/network events —NetLogger event archive and retrieval tools (new) NetLogger combines network, host, and application-level monitoring to provide a complete view of the entire system. Open Source (
- GMA Athena (24mar03 - CHEP La Jolla, CA) GMASvc Service Typical Athena Abstract Interface design. —Dual Use Library Linking Algorithms, etc & Loading DL —Concrete implementation using NetLogger —Properties to adjust: NetLogger: On/Off/Level, Distinguished User Name, Activation Service —Controlled by Environment Variables. —Use in Algorithms, Converters, StoreGate Store/Retreive, etc. GMAAuditor —Typical Athena Auditor bracketing standard Algorthm methods (initialize, execute, finalize)
- GMA Athena (24mar03 - CHEP La Jolla, CA) Atlas Athena Monitoring Activation: SC02 Demo
- GMA Athena (24mar03 - CHEP La Jolla, CA) Activation Service Architecture
- GMA Athena (24mar03 - CHEP La Jolla, CA) Activation Service GUI
- GMA Athena (24mar03 - CHEP La Jolla, CA) NetLogger Analysis: Key Concepts NetLogger visualization tools are based on time correlated and object correlated events. —precision timestamps (default = microsecond) If applications specify an “object ID” for related events, this allows the NetLogger visualization tools to generate an object “lifeline” In order to associate a group of events into a “lifeline”, you must assign an “Event ID” to each NetLogger event —Sample Event ID: file name, block ID, frame ID, etc.
- GMA Athena (24mar03 - CHEP La Jolla, CA) NLV Athena Example
- GMA Athena (24mar03 - CHEP La Jolla, CA) Completed Tasks Instrumented several Athena components with NetLogger Developed prototype activation service Developed prototype interface to the activation service for Athena monitoring events Demonstrated at SC02
- GMA Athena (24mar03 - CHEP La Jolla, CA) Current Work We are now working on expanding on the components used in the SC02 demo —Develop a “proof of concept” general purpose Grid troubleshooting architecture in concert with GANGA, Athena, DOE Science Grid Tasks include —Further integration of Atlas Software with Globus (Large ITR work related) —Further NetLogger instrumentation of Globus, GANGA, and Athena —Redesign of activation service for increased performance —Integration with Karlo Berket’s scalable and secure peer-to-peer resource discovery service will be used to locate producers
- GMA Athena (24mar03 - CHEP La Jolla, CA) For More Information NetLogger: SC02 Demo: Athena: RE/OO/architecture/General/index.html RE/OO/architecture/General/index.html
- GMA Athena (24mar03 - CHEP La Jolla, CA) Extra Slides if you want more details
- GMA Athena (24mar03 - CHEP La Jolla, CA) Monitoring Components
- GMA Athena (24mar03 - CHEP La Jolla, CA) Activation Service
- GMA Athena (24mar03 - CHEP La Jolla, CA) Ganglia Cluster Monitoring