Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN Data Analytics Use Cases

Similar presentations


Presentation on theme: "CERN Data Analytics Use Cases"— Presentation transcript:

1 CERN Data Analytics Use Cases
Manuel Martin Marquez Stefano Alberto Russo

2 Manuel Martin Marquez – CERN openlab
Today’s Objectives Overview - Data Analytics at CERN Use Cases Context Status Technologies applied Limitations Future Plans and Challenges Real Time Analytics Batch Analytics Repository AaaS - Analytics as a Service Manuel Martin Marquez – CERN openlab

3 Overview: Data Analytics at CERN
Manuel Martin Marquez – CERN openlab

4 Overview: Data Analytics at CERN
Huge interest and potential benefits for different domains at CERN IT – Information Technology BE – Beams EN – Engineering PH – Physics No one size fits all Real time, batch, data management. Different solutions, technologies, approaches Manuel Martin Marquez – CERN openlab

5 Use Cases: Accelerator Data Analysis
Context Logging Service persist several million signals Core infrastructure: Electricity, Industrial Data: Cryogenics, Vacuum, Beam data: position, currents, losses, Critical Service About 800 extraction clients About 120 custom applications More than 5 million request per day Massive increase (2-3x) in 2014 Extensive use of Oracle Technologies Manuel Martin Marquez – CERN openlab

6 Use Cases: Accelerator Data Analysis
Current Status: Data analysis focused on the service itself Improving reaction time and performance Basic streaming analysis: Complex Event Processing Based on system knowledge Basic in-database analysis built-in (PL/SQL) Embeded in the extraction tool Complex cases pushed to the users Matlab, Python, Labview, etc. Manuel Martin Marquez – CERN openlab

7 Use Cases: Accelerator Data Analysis
Future Plans and Challenges: Extract more value from the data Focus on accelerator operations Make common data analysis use cases easier Save and share analysis results Simple and fast access to data Domain specific language Not replacing the existing infrastructure and tools Manuel Martin Marquez – CERN openlab

8 Use Cases: Industrial Control Systems Analytics
Context: Support for Large industrial control systems Five major installations with million of parameters ALICE ATLAS CMS LHb Accelerator Complex Many Equiment groups Cryo Gas Vacuum Machine Protection Manuel Martin Marquez – CERN openlab

9 Use Cases: Industrial Control Systems Analytics
Current Status: Monitoring and control problem solved Alerting and reporting system Manually configured Based on threshold Huge data volumen acquiere and stored OS logs, performances metrics, device status, Measurements, Alarms Not much efficiently exploited Manuel Martin Marquez – CERN openlab

10 Use Cases: Industrial Control Systems Analytics
Current Status: P.O.C already started: Control System Health Alarm statistical Analysis Gas System Breakdown Evaluation of different technologies Drools WatchCAT Facing Problems: Data Access (Sensible and Protected) Integration of different data sources Common Data Analysis problems: classification, completeness, Manuel Martin Marquez – CERN openlab

11 Use Cases: Industrial Control Systems Analytics
Future Plans and Challenges: AaaS: Common framework for all the subsystems Configurable analysis flow by user High scalability of analysis processes Near real time and batch analysis Stream based data processing engine: CEP, Storm NoSQL data storage engine Manuel Martin Marquez – CERN openlab

12 Use Cases: Atlas Distributed DM System Analytics
Context: The Distributed Data Management Project (DDM) manage ATLAS data on the GRID 150 PB 1000 Active Users 500 million files Manuel Martin Marquez – CERN openlab

13 Use Cases: Atlas Distributed DM System Analytics
Current Status: Use both SQL and NoSLQ NoSQL complemetary to RDBMS Different uses cases Popularity analysis Data Aggregation and statistical analysis Manuel Martin Marquez – CERN openlab

14 Use Cases: Atlas Distributed DM System Analytics
Future Plans and Challenges: More complex use cases Trace Mining Analysis of the client interactions Replicas automatically managed Deletion Creation Forecasting Future dataset popularity Manuel Martin Marquez – CERN openlab

15 Use Cases: Intelligent data placement for CMS
Context CMS Grid resources for storage and offline analysis Hundreds users Daily up to jobs Data sample replicated 23 PB of data 18 PB transfer last year Manuel Martin Marquez – CERN openlab

16 Use Cases: Intelligent data placement for CMS
Current Status: Current Data Management model Manpower intensive Inneficient disk usage Data Popularity Services Cleaning Agent Automatic Deletion of obsolete replicas Implented using Oracle DB Metrics (Number of accesses, users, sites, processing time, etc) Multiple aggregations (Jobs success/failure, set of files, etc) Manuel Martin Marquez – CERN openlab

17 Use Cases: Intelligent data placement for CMS
Future Plans and Challenges: LHC Run 2 implies a 6x factor in computing resources Critical to optimize resources Jobs time in accessing analysing data Minimizing number of replicas Extract further knowledge from Monitoring data Classify analysis activities and predict resources Recommendation systems Learn from past trends and patters Manuel Martin Marquez – CERN openlab

18 Use Cases: Intelligent data placement for CMS
Future Plans and Challenges: Near real time Based on knowledge extracted from the data (Bacth) CEP Batch analysis Different technologies R In-database analytics - Oracle R Entreprise Hadoop Elastic Search Manuel Martin Marquez – CERN openlab

19 Use Cases: Network Monitoring WLCG
Context: WLCG relies heavily on the underlying networks Interconnect sites and resources PerfSONAR - Network Perfomance Measurement and Monitoring Manuel Martin Marquez – CERN openlab

20 Use Cases: Network Monitoring WLCG
Current Status: PerfSONAR deployed on 70% of infrastructure A Lot of data but making sense out of it not trivial at all Measurements span different time periods They measure different things (while all related to network) They might be affected by other measurements and/or events Manuel Martin Marquez – CERN openlab

21 Use Cases: Network Monitoring WLCG
Future Plan and Challenges: From Monitoring to Intelligent & Predicting Monitoring Time correlation During a PS throughput test, was there any known activity in the same link? There is packet loss, does this appears as degraded performance somewhere at the same time Loss of performance Is it a network problem and where? Is it a storage problem? Analyze the existing data, mine the information looking for known issues in the past Manuel Martin Marquez – CERN openlab

22 Use Cases: IT Monitoring and Analytics
Context: Monitoring in IT covers a wide range of resources Hardware, OS, applications, files, jobs, etc. Many high level resources are interdependent Several application-specific monitoring solutions Similar needs and architecture Publish metric results, aggregate results, alarms, etc. Different technologies and tool-chains Some based on commercial solutions Similar limitations and problems Limited sharing of monitoring data Manuel Martin Marquez – CERN openlab

23 Use Cases: IT Monitoring and Analytics
Current Status: Data Storage and Analysis Store all monitoring data in a common location Feed the system with processed data Use one single common data format (JSON) Permanent storage for historical data Data Visualization and Alarms Easy-to-use dashboards Efficient delivery of notifications Manuel Martin Marquez – CERN openlab

24 Use Cases: IT Monitoring and Analytics
Future Plan and Challenges: Intelligent and Predictive Monitoring Real time analytics Dashboard and interactive analytics Batch analysis: Data mining – exploratory AaaS based on several technologies Hadoop Elastic Search Kibana Storm Manuel Martin Marquez – CERN openlab

25 Use Cases: Analytics in Castor
Context: The CERN Advanced Storage Manager (CASTOR) is the mass storage solution of CERN, including LHC data hierarchical storage, both disks and tape 12K disks, 30K tapes, More than 100 PB of data lot of monitoring/log data up to 20 GB per day (~100M lines of log) totaling ~10 TB per year. stored in Hadoop/HBase and processed live for display in a cockpit Manuel Martin Marquez – CERN openlab

26 Use Cases: Analytics in Castor
Current Status: Monitoring system in production for CASTOR and being extended to EOS. online (simple) analysis in the Cockpit (time series and histograms) offline analyses on the long-term storage based on Hadoop/HBase. auditing, error recovery historical studies (e.g. usage of protocols) Service availability being registered on the Service Level Status (SLS) board Manuel Martin Marquez – CERN openlab

27 Use Cases: Analytics in Castor
Future Plan and Challenges: The current system is not covering two important topics: Expert system: spotting "interesting" time series out of a large monitoring data Early warning system: find predictive power to forecast potential dangerous situations Investigation of these topics started together with the DB group as part of the openlab data analytics activity finding the best models is challenging lot of data from different sources (avoiding time-consuming eye inspection) (e.g. overload conditions!) Data Analytics DB SLS Cockpit (standardization required!) Manuel Martin Marquez – CERN openlab

28 Analytics as a Service (AaaS)
Real-time data analysis Complex Event Processing Pattern Recognition Streaming data analysis Batch data analytics Forecasting Modelling Knowledge discovery for later apply to real time Manuel Martin Marquez – CERN openlab

29 Analytics as a Service (AaaS)
Data analytics repository Flexible data repository infrastructure Problem driven – no technology driven A combination of RDMS and noSQL Integrating existing data sources and systems Analytics framework (AaaS) Real-time analytics Batch Data analytics repo Data Analytics Visualization Manuel Martin Marquez – CERN openlab

30 Manuel Martin Marquez – CERN openlab
Conclusions Huge interest and potential benefits for CERN IT, BE, PH, EN departments Improve our Monitoring and control systems by mean of Data Analytics Intelligent Proactive Predictive Manuel Martin Marquez – CERN openlab

31 Manuel Martin Marquez – CERN openlab
Conclusions Challenges Real time analytics based on CERN use case Based on domain knowledge and hidden knowledge extracted by batch analytics CEP, Storm Batch analytics Correlation analysis Forecasting modeling Knowledge discovering Data analytics repository AaaS Manuel Martin Marquez – CERN openlab

32


Download ppt "CERN Data Analytics Use Cases"

Similar presentations


Ads by Google