Connie Logg February 13 and 17, 2005 DataGrid Wide Area Network Monitoring Infrastructure (DWMI) aka IEPM-BW Connie Logg February 13 and 17, 2005
History SLAC CALTECH BNL CERN Originally done for SC2001 demo and called IEPM-BW After SC2001, development continued FNAL picked up IEPM-BW and adapted it to their site In Spring 2004 – redesigned for TeraPaths monitoring project Currently still called IEPM-BW, and deployed at 4 sites SLAC CALTECH BNL CERN
Architecture - I Use MySQL database Written in perl All configuration is in the database so the code can self configure Allows flexibility for adding new types of data Written in perl Low impact probes (currently abwed, traced, and pingd) have daemons that run independently High impact probes have a daemon (bw-synchd) which insures that high impact probes do not run simultaneously and that there is a break between each test.
Architecture - II Results from all probes written to a data directory and are loaded by load-datad daemon which assures that the data base is not bombarded by hundreds of writes simultaneously. Analysis scripts run every hour or two depending upon how long they take Plot data, traceroute reports, master web page generation
MySQL Database Tables - I NODES – Each node has an entry and its specs (latitude, longitude, contact, paths, et al.) MONHOST – Active monitoring host(s) information (web/cgi paths, data analysis specs, et al.) TOOLSPECS – Probe specifications (probe, probe options, frequency, testtype, et al.)
MySQL Database Tables - II Many types of tests possible background – low impact tests which can run concurrently (traceroute, ping, abwe) background-syn – Tests which must be run one at a time (iperf) On demand – to be implemented
MySQL Database Tables - III SCHEDULE scheduler inserts probe requests into the SCHEDULE Daemons read SCHEDULE table for the probes they are responsible for within the “current” timeframe, and run the probes. All results are written to a data directory and loaded by the data loading daemon
APIs and other utilities Fetch-ping-data Fetch-abwe-data Fetch-trace-data Fetch-bw-data (e.g iperf) Etc.. All take a nodename and timespan and return a filename where the data is stored
Data Analysis Time series plots – group and individual Diurnal analysis & fitting Traceroute analysis Bandwidth Change Analysis – will be augmented by other methods currently be researched and developed
CGI Utilities – in development Add and update NODES Add and update TOOLSPECS Add and update MONHOST Interactive data analysis
Informational Web Pages Table of defined NODES Table of defined MONHOST Table of TOOLSPECS – probe specifications Description of data base tables Report on data logging for past few weeks PLM – needs updating Others to come – every time I have to look at something for validation, I create a web page
Futures Make data available via web services Interactive data analysis CGIs Add additional probe types Develop complete distribution kit – complicated by differing locations and versions of perl, gnuplot, mysql, graphics libs, ploticus, iperf, etc. Add additional anomaly detection techniques
Summary The objective is to provide for regular and reliable network probe testing and data collection from several locations around the world Make the data available to the community Provide a framework for the incorporation of a variety of analysis tools
Acknowledgements Many people have contributed content to this system over the years Maxim Grigoriev (FNAL), I-Heng Mei (SLAC RA), Manish Bhargava (SLAC RA), Ruchi Gupta (SLAC RA) Mahesh Chhaparia (SLAC RA) Parakram Khandpur (SLAC RA) And of course: Les Cottrell
Questions & Considerations BWCTL – not installed everywhere and it is one more thing I would need to install as part of the distribution kit and maintain Does not do multiple iperf streams May want other heavyweight tests that bwctl does not provide for OWAMP – special NTP configuration