Fulvio Galeazzi, CHEP 2003, Mar 24—28 2003 1 A Monitoring System for the BaBar INFN Computing Cluster Moreno Marzolla Università “Ca' Foscari” di Venezia.

Slides:



Advertisements
Similar presentations
--- IT Acumens. COMIT Acumens. COM SNMP Project. AIM The aim of our project is to monitor and manage the performance of a network. The aim of our project.
Advertisements

26/05/2004HEPIX, Edinburgh, May Lemon Web Monitoring Miroslav Šiket CERN IT/FIO
Performance Testing - Kanwalpreet Singh.
Capacity Planning in a Virtual Environment
1 CHEP 2000, Roberto Barbera Roberto Barbera (*) Grid monitoring with NAGIOS WP3-INFN Meeting, Naples, (*) Work in collaboration with.
Prime’ Senior Project. Presentation Outline What is Our Project? Problem Definition What does our system do? How does the system work? Implementation.
Embedded Web Hyung-min Koo. 2 Table of Contents Introduction of Embedded Web Introduction of Embedded Web Advantages of Embedded Web Advantages of Embedded.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
Mi-Joung choi, Hong-Taek Ju, Hyun-Jun Cha, Sook-Hyang Kim and J
Introduction to Systems Architecture Kieran Mathieson.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
THE AFFORDABLE SUPERCOMPUTER HARRISON CARRANZA APARICIO CARRANZA JOSE REYES ALAMO CUNY – NEW YORK CITY COLLEGE OF TECHNOLOGY ECC Conference 2015 – June.
1 Network Statistic and Monitoring System Wayne State University Division of Computing and Information Technology Information Technology.
EMBEDDED SYSTEMS G.V.P.COLLEGE OF ENGINEERING Affiliated to J.N.T.U. By By D.Ramya Deepthi D.Ramya Deepthi & V.Soujanya V.Soujanya.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
Introduction to HP LoadRunner Getting Familiar with LoadRunner >>>>>>>>>>>>>>>>>>>>>>
OASIS V2+ Next Generation Open Access Server CSD 2006 / Team 12.
1 A web enabled compact flash card reader eeble. 2 Weeble Team Chris Foster Nicole DiGrazia Mike Kacirek Website
Architecture Of ASP.NET. What is ASP?  Server-side scripting technology.  Files containing HTML and scripting code.  Access via HTTP requests.  Scripting.
How to Resolve Bottlenecks and Optimize your Virtual Environment Chris Chesley, Sr. Systems Engineer
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Enterprise PI - How do I manage all of this? Robert Raesemann J Jacksonville, FL.
BAI513 - PROTOCOLS SNMP BAIST – Network Management.
Hsu Chun-Hung Network Benchmarking Lab
การติดตั้งและทดสอบการทำคลัสเต อร์เสมือนบน Xen, ROCKS, และไท ยกริด Roll Implementation of Virtualization Clusters based on Xen, ROCKS, and ThaiGrid Roll.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
1 Apache. 2 Module - Apache ♦ Overview This module focuses on configuring and customizing Apache web server. Apache is a commonly used Hypertext Transfer.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
MapReduce How to painlessly process terabytes of data.
Asynchronous Interactive Design of Web Applications: Real-time SIP Message Monitoring System using AJAX Student: Yan-Hsiang Wang Advisor: Dr. Quincy Wu.
INFN-GRID Testbed Monitoring System Roberto Barbera Paolo Lo Re Giuseppe Sava Gennaro Tortone.
1 Specification and Implementation of Dynamic Web Site Benchmarks Sameh Elnikety Department of Computer Science Rice University.
INVITATION TO COMPUTER SCIENCE, JAVA VERSION, THIRD EDITION Chapter 6: An Introduction to System Software and Virtual Machines.
Chapter 19: Network Management Business Data Communications, 4e.
Implement An Online Management System for PBX Presented by: Bui Phuong Nhung Advisor: Dr. Wei, Chao-Huang.
Software Scalability Issues in Large Clusters CHEP2003 – San Diego March 24-28, 2003 A. Chan, R. Hogue, C. Hollowell, O. Rind, T. Throwe, T. Wlodek RHIC.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Distributed monitoring system. Why Monitor? Solve them! Identify Problems Ensure conduct Requirements Manage many computers Spot trends in the system.
LHCb-Italy Farm Monitor Domenico Galli Bologna, June 13, 2001.
Network Management Protocols and Applications Cliff Leach Mike Looney Danny Mar Monty Maughon.
Management of the LHCb DAQ Network Guoming Liu * †, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
CCNA4 v3 Module 6 v3 CCNA 4 Module 6 JEOPARDY K. Martin.
The BaBar Prompt Reconstruction Manager: a Real Life Example of a Constructive Approach to Software Development. Francesco Safai Tehrani Istituto Nazionale.
Copyright 2007, Information Builders. Slide 1 Machine Sizing and Scalability Mark Nesson, Vashti Ragoonath June 2008.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
Online Monitoring System at KLOE Alessandra Doria INFN - Napoli for the KLOE collaboration CHEP 2000 Padova, 7-11 February 2000 NAPOLI.
Super Scaling PROOF to very large clusters Maarten Ballintijn, Kris Gulbrandsen, Gunther Roland / MIT Rene Brun, Fons Rademakers / CERN Philippe Canal.
By Tom and James. Hardware is a physical part of the system that you can pick up and move. There are two types of hardware, external and internal. External.
OPTIMIZATION OF DIESEL INJECTION USING GRID COMPUTING Miguel Caballer Universidad Politécnica de Valencia.
Management of the LHCb DAQ Network Guoming Liu *†, Niko Neufeld * * CERN, Switzerland † University of Ferrara, Italy.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Homework 5 DNS 、 HTTPD 、 SNMP. Requirements One dedicated domain name for yourself Setup DNS server with following records  SOA, NS, MX  Make them reasonable.
Monitoring Dynamic IOC Installations Using the alive Record Dohn Arms Beamline Controls & Data Acquisition Group Advanced Photon Source.
Research and Service Support Resources for EO data exploitation RSS Team, ESRIN, 23/01/2013 Requirements for a Federated Infrastructure.
Scientific Linux Inventory Project (SLIP) Troy Dawson Connie Sieh.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
1 Building Web-base SIP Analyzer with Ajax Approach Yan-Hsiang Wang & Dr. Quincy Wu National Chi Nan University Graduate Institute of CSIE
Virtualization, Cloud Computing and Big Data
Universita’ di Torino and INFN – Torino
Grid Canada Testbed using HEP applications
Overview Introduction VPS Understanding VPS Architecture
Performance And Scalability In Oracle9i And SQL Server 2000
Web Servers (IIS and Apache)
Presentation transcript:

Fulvio Galeazzi, CHEP 2003, Mar 24— A Monitoring System for the BaBar INFN Computing Cluster Moreno Marzolla Università “Ca' Foscari” di Venezia and INFN, Padova Valerio Melloni Dip. Matematica, Università di Ferrara Presented by: Fulvio Galeazzi INFN, Padova

Fulvio Galeazzi, CHEP 2003, Mar 24— Talk Outline ● Introduction ● Motivation: Monitoring the BaBar Computing Farm ● PerfMC: A prototype of an SNMP- Based monitoring application ● Conclusions

Fulvio Galeazzi, CHEP 2003, Mar 24— Monitoring ● A monitor is a tool used to observe the activities on a system – Collects performance measures – (Possibly) Analyzes the data – Displays the results. ● Why? – Measure resource utilization to find performance bottlenecks; – Characterize the Workload; – Find model parameters, validate models, or develop inputs for a model.

Fulvio Galeazzi, CHEP 2003, Mar 24— BaBar INFN Padova ●  170 2xPIII 1.26GHz Machines, 1GB Ram, RH Linux 7.2 – 130 Clients – 40 Servers ● Tape Library with a capacity of ~70TB not compressed; ● Network switches, UPSes, Environmental conditioning systems,...

Fulvio Galeazzi, CHEP 2003, Mar 24— Monitoring Requirements ● Hardware Status – Machine Crashes, CPU utilization, Disk I/O, Network I/O... ● Processes status ● Environmental conditions – Humidity, Temperature, UPS status... ● Does not need to be a real-time monitor ● The monitoring system should also be: – Reasonably Scalable – Efficient (low resources requirement) – Flexible and customizable – Easy to configure – Able to operate in background

Fulvio Galeazzi, CHEP 2003, Mar 24— Some problems with existing tools ● Limited scalability ● Require their own dæmons running on the monitored hosts ● Can't install a dæmon on a network switch, or on a tape library ● Hard to configure ● Poorly implemented ● Heavy use of scripting languages, mixed C/Perl/shell pieces

Fulvio Galeazzi, CHEP 2003, Mar 24— PerfMC: a Performance Monitor for Clusters ● Characteristics: – Written in C – Asynchronous (nonblocking) parallelized SNMP Polling – Uses SNMPv2 Bulk Get requests – XML-based configuration file – The RRDTool package is used to store data and produce graphs ● Old data have lower resolution than recent ones ● Round Robin Databases have known, fixed size ● Graphing capabilities are provided by the library – Dynamic generation of HTML pages using XSLT stylesheets and a filter (PHP...) ● PerfMC has an embedded HTTP server

Fulvio Galeazzi, CHEP 2003, Mar 24— PerfMC Architecture <?xml version="1.0" standalone="no"?> <!DOCTYPE monitor SYSTEM "monitor.dtd">... XML Configuration File SNMP Poller HTTPD Host 1Host 2Host n StylesheetsHTML Pages Monitored Hosts PerfMC In-core Status RRD XSLT Engine Graphs Filter (PHP...)

Fulvio Galeazzi, CHEP 2003, Mar 24— Sample HTML Output

Fulvio Galeazzi, CHEP 2003, Mar 24— Another example

Fulvio Galeazzi, CHEP 2003, Mar 24— Some Plots Database Server Client Machine

Fulvio Galeazzi, CHEP 2003, Mar 24— PerfMC Performances Reasonably low CPU Utilization ( < 2%) Reasonably low Network Utilization (11 KB/s) Not-so low Disk Utilization (1.2 MB/s)

Fulvio Galeazzi, CHEP 2003, Mar 24— Conclusions ● Monitoring a large computing cluster is a highly nontrivial task. ● Many available monitoring tools exist, but many of them are not adequate for large distributed systems. ● We are trying to build a general-purpose SNMP and XML-based monitoring tool. ● A prototype exists and is working well

Fulvio Galeazzi, CHEP 2003, Mar 24— Future work ● Alarms are not currently implemented, but are at the top position of the to-do list ● At the moment there are no serious problems. – PerfMC is running on the production cluster (  machines) – Clearly cannot scale forever – Single point of failure ● No attention has been put on security – Could easily be extended for SNMPv3 – Not a priority. Our cluster is on a private network

Fulvio Galeazzi, CHEP 2003, Mar 24— Bibliography ● Moreno Marzolla, A Performance Monitoring System for Large Computing Clusters, proceedings of PDP2003, Genova, Italy, Feb 5—7, 2003 ● W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, third edition, Addison-Wesley, 1999 ● Grid Performance Working Group ● RRD Tools Home Page ● BaBar Farm home page (will contain PerfMC) ● BaBar Farm Monitoring Page

Fulvio Galeazzi, CHEP 2003, Mar 24— Backup Slides

Fulvio Galeazzi, CHEP 2003, Mar 24— Example of XML Configuration File <monitor numconnections="50" pmclogfile="/monitor/pmc.log" httpdlogfile="/dev/null" rrddir="/monitor" htmldir="/monitor/html" pmcverbosity="3" > This is a sample client machine

Fulvio Galeazzi, CHEP 2003, Mar 24— Sample XML configuration file (MIB)

Fulvio Galeazzi, CHEP 2003, Mar 24— Example of XML Status Dump

Fulvio Galeazzi, CHEP 2003, Mar 24— More on SNMP Network Protocol IP UDP SNMP Management Application GetRequestGetRequest GetNextRequestGetNextRequest SetRequestSetRequest GetResponseGetResponse TrapTrap Network Protocol IP UDP SNMP SNMP Managed Objects GetRequestGetRequest GetNextRequestGetNextRequest SetRequestSetRequest GetResponseGetResponse TrapTrap Managed Resources LAN/WA N W. Stallings, SNMP, SNMPv2, SNMPv3 and RMON 1 and 2, 3rd edition, p. 81

Fulvio Galeazzi, CHEP 2003, Mar 24— Round Robin Databases Time Value Drop Recent data Old data