GridMonitor: Integration of Large Scale Facility Monitoring With MDS Richard Baker, Antonio Chan Richard Baker, Antonio Chan Jason Smith, Dantong Yu USATLAS/RHIC.

GridMonitor: Integration of Large Scale Facility Monitoring With MDS Richard Baker, Antonio Chan Richard Baker, Antonio Chan Jason Smith, Dantong Yu USATLAS/RHIC Computing Facility Brookhaven National Lab

6/27/2015 CHEP 03, La Jolla 2 Outline  Requirements  System Framework, Structure and Characteristics  I: Ganglia and Its Information Provider  II: Archive and Its Information Provider  Gridview, Front End System: http://heppc1.uta.edu/atlas/grid-status/mds.gremlin.usatlas.bnl.gov.html  Current Status and Future Works

6/27/2015 CHEP 03, La Jolla 3 Requirements  Requirements :  Modularity and Extensibility: Make Use of Existing Monitoring Pieces  Flexibility: Adjustable to the Dynamics of the Monitored Systems  Overhead: Non-intrusive  Scalability  Security, Consistency, Inter-operability, Etc-bility

6/27/2015 CHEP 03, La Jolla 4 What Need to Be Monitored  Linux Farm Monitoring  Description  About 1100 Dual CPU LINUX Nodes.  Performance Data Must Be Summarized for Advertising to Grid.  Performance Events Required:  Configuration Information  Status Information: CPU Load, (1, 5, 10, 15), Memory Load, Disk Load, and Network Load  Example Usage: A Resource Broker Might Ask the Availability of Linux Farm System Resources in Order to Plan the Efficient Execution of Tasks

6/27/2015 CHEP 03, La Jolla 5 More…  Network Monitoring:  Description:  8 USATLAS Testbeds  Publish the Connectivity of These Test-beds, Monitor the Healthiness of the USATLAS Network  Archived Performance Data Can Be Used to Predict the Network Behavior a User Can Choose the Source and Destination for File Replication  Performance Events Required:  Bandwidth, Delay ( Round Trip Time), Trace Route

6/27/2015 CHEP 03, La Jolla 6 Monitoring Framework Monitoring Database ( ODBC+MYSQL) Or RRD DB Info. Providers Data Collectors Aggregate Service Index (GIIS) Grid-View (Web Server) Information Provider (GRIS) Information Provider (GRIS) Information Provider (GRIS) Information Provider (GRIS) Grid-info-search Server HPSS Network Computing Nodes Sensor

6/27/2015 CHEP 03, La Jolla 7 Monitoring System Components Four Tier Structure  Sensors  Host: Ganglia, Top, /Proc and lsf Host Load  Archive System (Database System)  Round Robin Database (RRD)  Relational Database: UNIXodbc+myodbc+mysql Database  Information Providers  Monitoring and Discovery Service (Mds2.2), GLUE Schema, Customized Ganglia Client Tool Reporting the Lastest Monitoring Data and Database Client Tools Reporting the Summary Information  Front-end Browsing System  Gridview (Grid Visualization Tool Developed at Univ. of Texas at Arlington)

6/27/2015 CHEP 03, La Jolla 8 Advantages  Information Provider Provides Cache for the Newest Value From the Mysql Database  Non-intrusiveness: Information Provider Can Eliminate the User Random Accesses to the Database Server  Scalability Can Be Significantly Increased  1000 Linux Nodes Are Being Monitored  Network Connectivity of Eight Usatlas Testbeds: Each Site Monitoring the Paths From Itself to the Other Seven. Network Topology and Traffic Can Be Easily Constructed  Flexibility:  Independent on Sensors. Many Sensors Can Be Easily Plugged As Long It Has Well Defined Protocol and API: We Could Switch Among Ganglia, top, /proc  Archive System Is Independent to Underlying Database  Can Be rdbms, Oracle, Mysql, Sybase, Informix, Flat Files, Objectivity As Long the Odbc Drivers Is Available

6/27/2015 CHEP 03, La Jolla 9 I: Ganglia Monitoring with MDS  Ganglia Information Provider  Front-end: Glue-schema Http://www.cnaf.Infn.It/~sergio/datatag/glue/  Back-end: XML Cluster A Multicast Channel Gmond XML Gmetad (filtered) Gmetad (filtered) … ? MDS Ganglia IP XML GLUE Layered Gmetad

6/27/2015 CHEP 03, La Jolla 10 I: Ganglia Monitoring with MDS gremlin % grid-info-search -x -h spider.usatlas.bnl.gov -s one # ATLAS Linux Cluster, local, grid dn: cl=ATLAS Linux Cluster, mds-vo-name=local, o=grid objectClass: GlueClusterTop objectClass: GlueCluster GlueClusterName: ATLAS Linux Cluster GlueClusterUniqueID: ATLAS_Linux_Cluster-RCF_and_ACF_Linux_Farm_Group GlueClusterService: compute # PHOBOS CAS Linux Cluster, local, grid # PHOBOS CAS Linux Cluster, local, grid dn: cl=PHOBOS CAS Linux Cluster, mds-vo-name=local, o=grid objectClass: GlueClusterTop objectClass: GlueCluster GlueClusterName: PHOBOS CAS Linux Cluster GlueClusterUniqueID: PHOBOS_CAS_Linux_Cluster-RCF_and_ACF_Linux_Farm_Group GlueClusterService: compute # STAR CAS Linux Cluster, local, grid # STAR CAS Linux Cluster, local, grid dn: cl=STAR CAS Linux Cluster, mds-vo-name=local, o=grid objectClass: GlueClusterTop objectClass: GlueCluster GlueClusterName: STAR CAS Linux Cluster GlueClusterUniqueID: STAR_CAS_Linux_Cluster-RCF_and_ACF_Linux_Farm_Group GlueClusterService: compute

6/27/2015 CHEP 03, La Jolla 11 II: Farm Monitoring  Linux Farm Is Divided Into Different Sub-clusters Based on Site Policy, Different Experiments, OS and Version, CPU Speed. A Sub-cluster Contains the Host With the Same Configuration  Bnl Atlas Farm Is Partitioned Into Four Subclusters: Cpu400mhz, Cpu700hz, Cpu1ghz, Cpu1.4ghz and CPU 2.4GHZ  The Status Information of a Sub-cluster Is Summarized From All Nodes in This Sub-cluster  Grid Resource Broker Schedules in the Level of Farm Sub- clusters

6/27/2015 CHEP 03, La Jolla 12 Information Schema (Linux Farm Monitoring) Queue-Info:  objectclass ( 1.3.6.1.4.1.3536.2.6.0.0.0.0 NAME 'Queue-Info' SUP 'Mds' STRUCTURAL MUST ( MdsQueueNumberOfCpu $ MdsQueueSpeed $ MdsQueueAverageLoad $ MdsQueueAverageUserPercent $ MdsQueueAverageSysPercent ))  Need to be replaced by GLUB-schema

6/27/2015 CHEP 03, La Jolla 14 Information Provider (Linux Farm Monitoring)  # generate Farm information every 10 minutes dn: MdsFarmQueueName=1000, MdsHostNodeDomainName=usatlas.bnl.gov, Mds-Host-hn=gremlin.usatlas.bnl.gov, Mds-Vo-name=local, o=grid objectclass: GlobusTop objectclass: GlobusActiveObject objectclass: GlobusActiveSearch type: exec path: /usr/local/globus-new/customize base: mds-farm-batch-info.pl args: -dn MdsFarmQueueName=1000,MdsHostNodeDomainName=usatlas.bnl.gov,Mds- Host-hn=gremlin.usatlas.bnl.gov,Mds-Vo-name=local,o=grid -ttl 900 cachetime: 600 timelimit: 20 sizelimit: 400

6/27/2015 CHEP 03, La Jolla 15 Observation from Grid-View

6/27/2015 CHEP 03, La Jolla 16 Current Status and Future Work  Current Status:  Sensors & Local Monitoring Tools Put Less Than 1 Percent CPU Load: Non-intrusive  Improved the Ganglia Information Provider, It Can Obtain Information From Both Gmond and Gmetad  Multiple & Hierarchical Clusters Are Supported  Future Works  Merge the Ganglia RRD Information Provider and the Archive DB Information Provider  Work With the Ganglia Team and Glue-schema, Help to Define Requirements for What Information Be Monitoring for Job Scheduling  Automate the Mapping From Xml to Glue Schema, Provide Flexibility  Continue to Optimize The Information Provider to Deliver Data Faster  Scalability Test  Extend This Prototype To Other Facility Monitoring

GridMonitor: Integration of Large Scale Facility Monitoring With MDS Richard Baker, Antonio Chan Richard Baker, Antonio Chan Jason Smith, Dantong Yu USATLAS/RHIC.

Similar presentations

Presentation on theme: "GridMonitor: Integration of Large Scale Facility Monitoring With MDS Richard Baker, Antonio Chan Richard Baker, Antonio Chan Jason Smith, Dantong Yu USATLAS/RHIC."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

GridMonitor: Integration of Large Scale Facility Monitoring With MDS Richard Baker, Antonio Chan Richard Baker, Antonio Chan Jason Smith, Dantong Yu USATLAS/RHIC.

Similar presentations

Presentation on theme: "GridMonitor: Integration of Large Scale Facility Monitoring With MDS Richard Baker, Antonio Chan Richard Baker, Antonio Chan Jason Smith, Dantong Yu USATLAS/RHIC."— Presentation transcript:

Similar presentations

About project

Feedback