Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring Evolution and IPv6

Similar presentations


Presentation on theme: "Monitoring Evolution and IPv6"— Presentation transcript:

1 Monitoring Evolution and IPv6
Alberto AIMAR, IT-CM-MM

2 Outline Context Data Centres Monitoring Experiments Dashboards
Architecture Plans Status Demo

3 Monitoring Data Centre Monitoring Experiment Dashboards
Monitoring of DC at CERN and Wigner Hardware, operating system, and services Data Centres equipment (PDUs, temperature sensors, etc.) Used by service providers in IT, experiments Experiment Dashboards Sites availability, data transfers, job information, reports Used by WLCG, experiments, sites and users Both hosted by CERN IT, in different teams

4 Context Focus for 2016 Regroup monitoring activities hosted by CERN/IT (Data Centres, Experiment Dashboards, ETF, HammerCloud, etc) Continue existing services Uniform with CERN IT practices Management of services, communication, tools (e.g. GGUS and SNOW tickets) Starting with Merge Data Centres and Experiment Dashboards monitoring technologies Review existing monitoring usage and needs (IT, WLCG, etc) Investigate new technologies Unchanged support while collecting user feedback

5 Unified Monitoring Architecture
Data Sources Storage/Search Transport Views Data Centers Processing WLCG Data kafka

6 Experiment Dashboards
Operation Teams Sites Analysis + Production Real time and Accounting views Data transfer Data access Site Status Board SAM3 Google Earth Dashboard Sites Data Management Monitoring General Public Users Outreach Job Monitoring Infrastructure Monitoring Operation Teams Operation Teams Experiment Dashboard covers the full-range of experiments’ computing activities Provides information to different categories of users Sites Sites users per day

7 Experiment Dashboards
Job monitoring, sites availability, data management and transfers Used by experiments operation teams, sites, users, WLCG

8 Processing & Aggregation
Current Monitoring Data Sources z Transport Storage &Search Display & Reports Data Centres Monitoring Metrics Manager Flume HDFS Kibana Lemon Agent AMQ ElasticSearch Jupyter XSLS Kafka Oracle Zeppelin ATLAS Rucio ElasticSearch Data mgmt and transfers Flume FTS Servers HDFS AMQ DPM Servers Oracle GLED Dashboards (ED) XROOTD Servers ElasticSearch Kibana CRAB2 Oracle Zeppelin Monitoring Job CRAB3 ElasticSearch HTTP Collector WM Agent Processing & Aggregation SQL Collector Farmout Grid Control MonaLISA Collector Spark Real Time (ED) CMS Connect Hadoop Jobs Accounting (ED) PANDA WMS GNI API (ED) ProdSys Oracle PL/SQL Nagios ESPER Infrastructure Monitoring AMQ VOFeed Spark SSB (ED) HTTP GET OIM Oracle PL/SQL SAM3 (ED) HTTP PUT ES Queries GOCDB API (ED) ESPER REBUS

9 Processing & Aggregation
Unified Monitoring Data Sources Transport z Storage &Search Display & Reports Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers Hadoop HDFS DPM Servers ElasticSearch XROOTD Servers Other CRAB2 Flume Kibana CRAB3 AMQ Jupyter WM Agent Kafka Processing & Aggregation Zeppelin Farmout Other Grid Control CMS Connect PANDA WMS Spark ProdSys Hadoop Jobs Nagios GNI VOFeed Other OIM GOCDB REBUS

10 Status Producers and Transport Storage and Search Processing
Moving all data via new transport (Flume, AMQ, Kafka) Storage and Search Data in ES and Hadoop Processing Doing aggregation and processing via Spark Display and reports Experimenting using only the standard features of ES, Kibana, Spark, Hadoop Introduce notebooks and data discovery General Selecting technologies, learning on the job, looking for expertise Evolve interfaces (e.g. dashbords for users, shifters, sites, managers)

11 IPv6 and Monitoring for WLCG
Data Sources We are confident that there are no major issues : No major changes vs the check in 2013 Evolution to the new architecture will take IPv6 into account Using the main stream technologies, very little code of our own Data sources Relies on external systems providing monitoring data Depends in data provided by FTS, Rucio, Panda, CRAB3, Xrootd, etc. MonALISA is external, used by ALICE and other projects (tested by ML devs) Transport Receives data via AMQ/Stomp, Flume, UDP, databases and HTTP sources. It is matter of staying up to date with ipv6-ready versions Metrics Manager Lemon Agent XSLS ATLAS Rucio FTS Servers DPM Servers XROOTD Servers CRAB2 CRAB3 WM Agent Farmout Grid Control CMS Connect PANDA WMS ProdSys Nagios VOFeed OIM GOCDB REBUS

12 IPv6 and Monitoring for WLCG
z Storage &Search Display & Reports Transport Hadoop HDFS ElasticSearch Flume Other Kibana AMQ Jupyter Kafka Processing & Aggregation Zeppelin Spark Other Hadoop Jobs GNI Other Storage Storing mostly the host names only, as strings In a few cases the current Experiment Dashboards may store WN IP and will be fixed in the migration ElasticSearch has an IPv4 data type, but not IPv6 at the moment. Will come. Display and reports Only showing IPv4 and IPv6 hosts, names as strings Web applications can easily be made reachable by IPv6 nodes, actions will be needed (just like any other web server)

13 Plans Unified architecture and technologies
Focus on migrating to common architecture Review the existing architecture, areas and data Update to new technologies in several areas Better perfomance and new versions with new features and major improvements Look into technologies as needed (collectd, Kafka, Grafana, etc.) Benefit from experience and feedback received from Experiments , WLCG and IT groups Move to central services Central service for ES is being created, InfluxDB for time series DBoD Continue to use central Hadoop services Continue with standard operations and upgrades At least for all 2016 Make available the new monitoring platform, in parallel with the existing ones

14 Conclusions No major changes vs. What reported in 2013
Mainstream technologies benefit from community effort and/or official support for IPv6 readiness. Evolution to the new architecture is used to review the whole monitoring data IPv6 is one of the reviews we will do No specific issues for monitoring detected

15 Demo Data in ElasticSearch FTS and Xrootd transfers data
Examples of Dashboards Data discovery and error investigation Specific views for specific tasks VO overview Site manager

16

17 Data Centres Monitoring

18 Monitoring Technologies
Area Services and Components Technologies Functions Data Collectors Metric Manager CERN Metrics registration Lemon Agent CERN+Flume Metrics producers (about 15000) Transport Gateway Flume Transport host metrics XSLS Service metrics Messaging Active MQ Messaging of metrics River Kafka Streaming of metrics Aggregation Foz Spark Processing streamed metrics Archive HDFS Hadoop Long term storage Displays Meter ElasticSearch + Kibana Dashboard for metrics Timber ElasticSearch+ Kibana Dashboard for logs Meter Proxy CLI and HTTP Interface to ES Alerts GNI Alarms handling


Download ppt "Monitoring Evolution and IPv6"

Similar presentations


Ads by Google