CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF.

Slides:



Advertisements
Similar presentations
Thanks to Microsoft Azure’s Scalability, BA Minds Delivers a Cost-Effective CRM Solution to Small and Medium-Sized Enterprises in Latin America MICROSOFT.
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland t The Agile Infrastructure Project Monitoring Markus Schulz Pedro Andrade.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
Evaluation of NoSQL databases for DIRAC monitoring and beyond
CERN IT Department CH-1211 Genève 23 Switzerland t Messaging System for the Grid as a core component of the monitoring infrastructure for.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Communicate with All Workers Involved in the Process of Delivering High-Quality Health Care by Choosing Dossier365 on the Azure Platform MICROSOFT AZURE.
CERN IT Department CH-1211 Geneva 23 Switzerland t Daniel Gomez Ruben Gaspar Ignacio Coterillo * Dawid Wojcik *CERN/CSIC funded by Spanish.
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences
Techcello Provides SaaS Lifecycle Management Solution to “SaaS-ify” Your Application Efficiently on the Powerful Microsoft Azure Cloud Platform MICROSOFT.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities IT Monitoring IT/CF IT/OIS – IT/PES IT Technical.
CERN IT Department CH-1211 Geneva 23 Switzerland t GDB CERN, 4 th March 2008 James Casey A Strategy for WLCG Monitoring.
== Enovatio Delivers a Scalable Project Management Solution Minus Large Upfront Infrastructure Costs, Thanks to the Powerful Microsoft Azure Platform MICROSOFT.
WLCG infrastructure monitoring proposal Pablo Saiz IT/SDC/MI 16 th August 2013.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSG - A messaging system for efficient and.
CERN IT Department CH-1211 Genève 23 Switzerland t IT Configuration Activities Gavin McCance Online Cross-experiment Meeting, 14 June 2012.
Microsoft Azure Integrated with C21 Live Cloud Mosaic Helps Control Your Live Streaming from Anywhere by Deploying in Global Azure Regions MICROSOFT AZURE.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Agile Infrastructure Monitoring HEPiX Spring th April.
CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities IT Monitoring CERN IT-CF HEPiX Fall 2013.
+ Logentries Is a Real-Time Log Analytics Service for Aggregating, Analyzing, and Alerting on Log Data from Microsoft Azure Apps and Systems MICROSOFT.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CF Monitoring: Lemon, LAS, SLS I.Fedorko(IT/CF) IT-Monitoring.
Microsoft Azure and DataStax: Start Anywhere and Scale to Any Size in the Cloud, On- Premises, or Both with a Leading Distributed Database MICROSOFT AZURE.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Alarming with GNI VOC WG meeting 12 th September.
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
CERN IT Department CH-1211 Genève 23 Switzerland t Migration from ELFMs to Agile Infrastructure CERN, IT Department.
CASTOR logging at RAL Rob Appleyard, James Adams and Kashyap Manjusha.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF CC Monitoring I.Fedorko on behalf of CF/ASI 18/02/2011 Overview.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN Agile Infrastructure Monitoring Pedro Andrade CERN – IT/GT HEPiX Spring 2012.
Streaming Analytics with Spark 1 Magnoni Luca IT-CM-MM 09/02/16EBI - CERN meeting.
Computing Facilities CERN IT Department CH-1211 Geneva 23 Switzerland t CF Cluman: Advanced Cluster Management for Large-scale Infrastructures.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Agile Infrastructure Project Overview : Status and.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Author etc Alarm framework requirements Andrea Sciabà Tony Wildish.
CERN IT Department CH-1211 Genève 23 Switzerland t Monitoring: Present and Future Pedro Andrade (CERN IT) 31 st August.
Abstract MarkLogic Database – Only Enterprise NoSQL DB Aashi Rastogi, Sanket V. Patel Department of Computer Science University of Bridgeport, Bridgeport,
Data Analytics and Hadoop Service in IT-DB Visit of Cloudera - April 19 th, 2016 Luca Canali (CERN) for IT-DB.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Our experience with NoSQL and MapReduce technologies Fabio Souto.
Snip2Code: Search, Share and Collect Code Snippets Faster, Easier, Efficiently with Power of Microsoft Azure Platform MICROSOFT AZURE ISV PROFILE: SNIP2CODE.
IT Monitoring Service Status and Progress 1 Alberto AIMAR, IT-CM-MM.
A presentation on ElasticSearch
Pilot Kafka Service Manuel Martín Márquez. Pilot Kafka Service Manuel Martín Márquez.
AuraPortal Cloud Helps Empower Organizations to Organize and Control Their Business Processes via Applications on the Microsoft Azure Cloud Platform MICROSOFT.
Monitoring Evolution and IPv6
Agenda:- DevOps Tools Chef Jenkins Puppet Apache Ant Apache Maven Logstash Docker New Relic Gradle Git.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
WinCC-OA Log Analysis SCADA Application Service - Reporting
Open Source distributed document DB for an enterprise
POW MND section.
CWG10 Control, Configuration and Monitoring
Experience in CMS with Analytic Tools for Data Flow and HLT Monitoring
IT Monitoring Service Status and Progress
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Logsign All-In-One Security Information and Event Management (SIEM) Solution Built on Azure Improves Security & Business Continuity MICROSOFT AZURE APP.
CS6604 Digital Libraries IDEAL Webpages Presented by
Quasardb Is a Fast, Reliable, and Highly Scalable Application Database, Built on Microsoft Azure and Designed Not to Buckle Under Demand MICROSOFT AZURE.
Project Goals Collect and permanently store the data flowing around ONAP system into several Big Data storages, each in different category. Also serve.
DBOS DecisionBrain Optimization Server
Production Manager Tools (New Architecture)
Presentation transcript:

CERN IT Department CH-1211 Geneva 23 Switzerland t CF Computing Facilities Agile Infrastructure Monitoring CERN IT/CF CHEP th October 2013

Computing Facilities 2 Motivation –Several independent monitoring activities in CERN IT –Combination of data from different groups necessary –Understanding performance became more important –Move to a virtualised dynamic infrastructure Challenges –Implement a shared architecture and common tool-chain –Delivered under a common collaborative effort Introduction

Computing Facilities –Monitoring team with contributions from few IT groups –Architecture design and definition –Several initial prototypes and studies 2013 –Core monitoring team (in IT/CF since March 2013) –Close collaboration with other IT groups –Implementation and deployment of solutions History

Computing Facilities 4 Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport

Computing Facilities 5 –Integrate data from multiple producers –Support legacy producers Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport

Computing Facilities 6 –Scalable transport to collect monitoring and operations data –Easy integration with providers and consumers Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport

Computing Facilities 7 Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport –Long term archival of collected data –Offline processing of collected data –Allow future data replay to other tools

Computing Facilities 8 Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport –Limited data retention –Real-time queries –Easy to deploy and horizontal scalable

Computing Facilities 9 Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport –Dynamic creation of dashboards –Tailored for global and application specific views –User friendly

Computing Facilities 10 Architecture Analytics Long Term Repository Analytics Long Term Repository Notifications Visualization Producer Transport –Quick and reliable delivery of alarms –Delivery of notifications via multiple channels

Computing Facilities 11 Adopt open source tools –For each architecture block look outside for solutions –Large adoption and strong community support –Fast to adopt, test, and deliver (and easily replaceable) Integrate with new CERN infrastructure –AI project, OpenStack, Puppet, Roger, etc. Focus on simple adoption (e.g. puppet modules) Technologies

Computing Facilities 12 Technologies

Computing Facilities 13 Distributed service for collecting large amounts of data – Robust and fault tolerant – Horizontally scalable, multi-tier deployment – Many ready to be used input and output plugins Avro, Thrift, JMS, Syslog, HTTP, ES, HDFS, Custom, … – Java based, Apache license Feedback – Needs tuning to correctly size flume tiers – Available plugins saved a lot of time Flume

Computing Facilities 14 Distributed framework for large data sets processing Distributed filesystem designed for commodity HW – Suitable for applications with large data sets – Cluster provided by other IT group (DSS) – Data stored by cluster (might be revised) – Daily jobs to aggregate data by month Feedback – Large analytics potential to explore – Reliable external long term repository Hadoop HDFS 14

Computing Facilities 15 Distributed RESTful search and analytics engine – Real time acquisition, data is indexed in real time – Automatically balanced shards and replicas – Schema free, document oriented (JSON) – Based on Lucene (full-featured IR library) Feedback – Easy to deploy and manage – Robust and fast API – Powerful query language (DSL) ElasticSearch 15

Computing Facilities 16 Visualize time-stamped data from ElasticSearch – Designed to analyse log, perfectly fits time stamped data – No code, point & click to build your own dashboard – Built with AngularJS (from google) Feedback – Easy to install and configure – Very cool user interface – Fits many use cases (e.g. text, metrics) – Still limited feature set, but active growing community Kibana

Computing Facilities 17 General Notifications Infrastructure – Based on ActiveMQ messaging broker – Same monitoring producers (e.g. lemon) – Multiple consumers Snow consumer for ticket creation Dashboard consumer for web application No contact for node heartbeat checking Feedback – Efficient, direct ticket routing – Flexible, easy to add more consumers (e.g. SMS) GNI

Computing Facilities 18 Producers –From all puppet-based data centre nodes –Central monitoring Computing Facilities (lemon, syslog) –Application monitoring OS & Infrastructure Services (openstack) Platform & Engineering Services (batch lsf) Security Team (netlog, snoopy) Databases Services (web apps) Data and Storage Services (castor logs) Support for Distributed Computing (wlcg monitoring) Deployment

Computing Facilities 19 Flume –10 aggregator nodes, 5 nodes to HDFS + 5 nodes to ES HDFS –~500 TB cluster, 1.8 TB collected since mid July 2013 ElasticSearch –1 master node, 1 search node, 8 data nodes –90 days TTL, 10 shards/index, 2 replicas/shards –Running ElasticSearch Kibana plugin Deployment

Computing Facilities 20 Deployment

Computing Facilities 21 Several interesting technologies tested and deployed Full workflow deployed for concrete monitoring needs Verified by different monitoring producers Improve monitoring tools under a common effort Summary

Computing Facilities 22 Thank you !! Questions ??

Computing Facilities 23 Backup Slides

Computing Facilities 24 Sources –Avro, Thrift, JMS, Syslog, HTTP, Custom, … Sinks –Avro, Thrift, ES, Hadoop HDFS, Custom, … Flume