Presentation is loading. Please wait.

Presentation is loading. Please wait.

Optimising CERN systems through ML & DA using controls data

Similar presentations


Presentation on theme: "Optimising CERN systems through ML & DA using controls data"— Presentation transcript:

1 Optimising CERN systems through ML & DA using controls data
BE ML and DA Forum Workshop The main reason for this presentation is to introduce to you a very useful work carried out by BE-ICS in the context of the openlab project with different group at CERN in the domain of Data Analytics. Filippo Tilaro, M. Bengulescu, Fernando Varela (BE/ICS) Manuel Gonzalez (BE-BI) in collaboration with Siemens AG CT Munich, St. Petersburg, Brasov CERN – 28th May 2019

2 CERN Industrial Control Systems

3 BE-ICS interest in Big Data
Exploit the Big Data volume produced by our systems to: Extend the monitoring capabilities of the control systems By detecting symptomatic effects in the data which do not trigger alarms Reduce operational and maintenance costs Increase availability, stability and performance of the processes Detect anomalous behavior and over-usage of sensors, actuators, industrial devices Predictive maintenance Anticipate when sensors and actuators have to be replace Render Industrial Control Systems smarter Guide engineers and operators in order to take corrective actions The work of BE-ICS in Big Data spans well beyond the Department boundaries Control System Data Analytics

4 BE-ICS Collaborations
CERN collaboration with Siemens since 2011 to work in areas of common interest in the domains of: Evolution of SCADA systems 1 Fellow entirely paid by Siemens Big Data 1 LD post, which will now be converted temporary into 2 Fellow positions Access to a network of researches in Siemens specialized in ML techniques Collaboration with University of Valladolid, Spain for PID performance monitoring Potential collaboration with University of Marburg, Germany on Distributed Complex Event Processing

5 Selected work done so far
Some of the work done so far in the area of Big Data: Identifications of CERN use-cases (>40) for Big Data: Offline Stream-data Model trained on archived data but works on stream-data Implementation of algorithms to tackle some selected cases Joint development of a platform for Distributed Complex Event Processing Evaluation of Siemens solutions Mindsphere, IoT and edge devices, … Anomaly Detection based on ML Process optimization based on ML Distributed Complex Event Processing Root-cause analysis } Need for Edge Computing

6 Our vision Broker Analytics master UI Rules Cloud computing Combining cloud and edge computing into a single analytical framework Stream analysis Central rules deployment Distributed computational load across multiple nodes Support for multiple data ingestion protocols Users Advanced: Jupyter & Python Dummy: Simple web UIs to select type of analysis, time-windows and sets of signals for the analysis VM Driver WinCC OA Archiving NXCALS Broker Analytics worker Cloud & Edge Link Edge computing Middleware Fieldbus

7 Identified Use-Cases: Partial list
Control Systems Online Monitoring Faults diagnosis Engineering design Cryogenics 1. Detection of valves oscillation 2. Anomaly detection for sensors and actuators Compensation of heat excess in LHC magnets due to e- cloud Vibration analysis for cryo compressors PID performance evaluation Cooling & Ventilation Detection of tanks leak Detect PLC anomalies Assessment of dynamic vs fixed thresholds Machine protection LHC Circuit monitoring Identify causes for QPS data loss Data loss detection CERN Power Grid Forecast of the control system behaviour Electrical power quality of service Analysis of electrical power cuts Recommendation system for WinCC OA users Vacuum Vacuum leaks Understanding the degradation of the vacuum system due to leaks Anomalies in process regulation LHC Experiments (Gas Control Systems) Alarms flood management Root-cause fault analysis in Gas Control Systems Analysis of OPC-CAN middleware

8 UC1: Oscillation analysis for cryogenics valves
Goal: detect anomalous oscillation of valves Lifespan of valves in km! Impact on: Process system stability and safety Communication load Maintenance (overuse of valves) Performance (Physic time) Why data analytics? General algorithm to detect different oscillations Monitoring several thousands of signals (not manually!) Over physical instrumentations and channels 12136 AI, 4856 AO,4536 DI,1568 DO,8000 spare and virtual channels, ~4000 analogical control loops More than 120 PLCs Siemens S DP,30000 conceptual objects/parameters Examples of abnormal behaviour of valves

9 ~5000 signal analysed every 24h
UC1: Oscillation detection to minimize operational and maintenance cost Actions: Improve tuning of control loops to deal with external disturbances & unexpected interoperations Achievements Reduce maintenance cost by extending valves’ operational life More stable system Results: ~10% of the CRYO valves showed abnormal oscillations Multiple anomalies per valve up to 20 hours/month Wide range of anomalous frequencies: from 2 hours to sec Continuous anomaly detection analysis Web report CALS Hadoop cluster System expert Status: In production ~5000 signal analysed every 24h

10 UC2: Anomaly detection in CRYO signals
Presence of different anomalies not detected by the control systems! Possible causes: hardware failures/degradations wrong tuning/structure false measurements… Impact Process stability and safety Maintenance (overuse of valves) Performance and downtime Why data analytics? Too complex to embed calculations into the control systems Learn from historical data the group of signals with similar behaviour Valves CV910 positions in L2 (26th June 2017) Direct impact on the operational cost!!! Beam dump! Example: large excursions of 17L2 and 19L2 to compensate the temperature increase -> confirmation of additional deposit of heat load Cryo intro

11 UC2: Anomaly detection in CRYO signals
Signals Correlation and K-NN in action! Flipping fault detection Signal offset detection Multi-purpose algorithms Avoid lots of specialized analyses difficult to maintain ~5000 signal analysed continuously Able to detect faults not foreseen by experts Oscillation detection Faulty amplitude detection Better format of the images and split the titles, too many examples

12 UC2: Anomaly detection in CRYO signals
Learn the groups of sensors/actuators which behave similarly Physical and logical relations Exploit historical data (~4GB/day for Cryo) Combine Machine Learning techniques with Experts’ knowledge Build a model to detect abnormal system behaviours Challenges: Model not specific to a domain/system Different types of anomalies, duration and noise Not precise boundaries between normal/anomalous Mostly unsupervised training: no database of faults! Dynamic system => dynamic model Model building

13 UC4: Root-cause analysis in Experiments Gas Control Systems
9 Apps 28 gas systems deployed around LHC 4 Data Server, 51 PLCs (29 for process control, 22 for flow-cells handling) Essential for particle detection Reliability and stability are critical Any variation in the gas composition can affect the accuracy of the acquired data ~ physical sensors / actuators 6 Apps 7 Apps 6 Apps

14 Actual problem in the distribution and not in the Pump
UC4: Root-cause analysis in Experiments Gas Control Systems Diagnose Alarm flood Domino effect Misleading feedback! Actual problem in the distribution and not in the Pump Fault in the distribution system Alarms flooding Diagnosing a fault is complex: it may take weeks! Alarms flood: a single fault can generate up to thousands of events The 1st alarm is not necessarily the most relevant for the diagnosis The same fault generates different events sequence depending on the system status A single fault can stop the whole control process

15 UC4: Root-cause analysis in Experiments Gas Control Systems
Event stream analysis Analyze Learn Diagnose Data Identify and detect fault / abnormal pattern for Diagnosis and Prognostics Provide experts with Root-cause and Gap Analysis using Rules and Patterns Mining Forecasts, Trends and Early-Warnings to increase Operating Hours Event lists generated by the same fault Achievements: Identification of the root of the problem Algorithm learns patterns and use them to forecast possible faults Early warning to operators to intervene X T C D F A A E D N D B K D F A A B K D АА B A A B Alarm Pattern

16 UC5: Evaluation of PID performance
Assist system engineering BE-ICS in collaboration with the University of Valladolid (not an openlab activity with Siemens) Based on: “Performance monitoring of industrial controllers based on the predictability of controller behaviour”, R. Ghraizi, E. Martinez, C. de Prada Impact on the regulation of the entire control systems Too many PIDs to check manually! A general method to assess different PIDs structure Many sources of faults/malfunctions System status dependency External disturbances/factors Bad tuning/Wrong controller type/structure Slow degradation Process Controller u w y SP CV v MV

17 UC5: Evaluation of PID performance
PID anomaly detection: Assess each PID model based on the historical data Simple performance index Efficiency of control process: Time/actions taken/energy consumed to reach steady points Stability of the controlled variable Improvement of ~10% of the analysed control loops Bad Good

18 UC7: LHC circuit monitoring
Condition monitoring analysis (in collaboration with TE-MPE) Main Goal: evaluation of the superconducting circuits health Degradation after 20 years of operations Monitoring conditions: anomalous change of current flows, impedance, circuit functioning … What to monitor? Electrical circuits magnets, power converters, switches … Control system: 16 WinCC OA servers 44 industrial FECs 2800 radiation-hard devices ~ 500M Signals Readout (from 10KHz to 1Hz)

19 UC7: LHC circuit monitoring
Distribute complex event processing Inefficient current flow of analysis Manual data extraction, transformation and load Many independent scripts Time consuming New expert system as common framework Translate experts’ knowledge into formulation sets / rules Central knowledge database Rule template to be reused, parametrized, validated Domain specific language for simple formulation: Time reasoning and temporal expression Mathematical and logical functions Status: under development [lab testing]! Rule definition: Truth(sma(I_Meas, 1m30s)> I_Threshold)): duration(>=1h) Rules List of similar assets

20 UC3: Feed analytical results into the control system
Visualize the results of the analysis to the operators in order to take the proper actions! Status: Working prototype

21 Fault detection applied to industrial process
Collaboration with U. of Valladolid, Spain Application of PCA (Principal Component Analysis) to detect faults or degradation as early as possible to allow either preventive maintenance or to make operators aware to allow an optimal corrective action to increase uptime of an industrial plant PCA: an unsupervised, non parametric statistical technology, used in Machine learning to reduce the dimensionality of datasets feasible to the most relevant ones Applications: CV, CRYO Contact: Enrique Blanco

22 Pattern-based KPI discovery via CEP
Collaboration: Marburg University, Germany Extract and identify relevant KPI from alarms and data via online exploration, thanks to pre-emptive data indexing and CEP pattern matching techniques being researched at Marburg University (ChronicleDB). compare online query performance between Spark + Hbase / Kudu / Impala vs Marburg’s ChronicleDB and apply online KPI extraction techniques for predictive improvements of alarms Applications: EL distribution, Access Control Contact: Matthias Braegger, Brice Copy

23 Next use-cases Linac3 ion beam source optimization: In collaboration with BE-ABP Find the optimal settings of control inputs to Optimize the ion current in the beam transformer of LINAC3 Minimize the variance of the ion current Learn from the ~10 years data operation of LINAC3 Assist the operators to choose the best settings for operation Vacuum leak detection: In collaboration with TE-VSC Critical for the proper operation of the accelerators and LHC machine Initial for SPS, then for all the other vacuum systems Historical analysis of pressure sensors (Pirani and Penning gauges) combined with : beam energy, beam mode and temperature sensors Inform operators to take the proper actions

24 Conclusions Data Analytics has an important added value already today to understand the behaviour and optimize complex systems Big impact on Operation and, running and maintenance Costs BE-ICS working on Data Analytics with Siemens for the last 5 years Openlab collaboration Growing community of users in different Groups and Departments Very distinct use-cases, not only related to controls General approach for multi domains application Reusability of the developed analysis

25 Use-cases: a partial list
Online monitoring Control System Health Electrical power quality of service Looking for heat in superconducting magnets Oscillation in cryogenics valves Discharge of superconducting magnets heaters Trending and forecast of the control process behavior Vacuum Leak detection Faults diagnosis Anomalies in the process regulation PLC anomalies Data loss detection Root-cause analysis for complex WinCC OA installations Analysis of sensors functioning and data quality Analysis of OPC-CAN middleware Analysis of electrical power cuts Cryogenic system breakdowns Engineering design Electrical consumption forecast Efficiency of electric network Predictive maintenance of control systems elements LINAC3 ion beam source optimization Vibration analysis Efficiency of control process Thank you! CERN BE-ICS

26 UC1: Compensation of the e- cloud thermal effect
In collaboration with TE-CRG (Benjamin Bradu) LHC vacuum chamber Cold bore at 1.9K Beam screen (5-20K) to intercept heat load Interference with Cryo control system Ideal measurement cycle Heat load of the screen Electrons released into the vacuum chamber, amplified via secondary emission from the chamber wall through a beam-induced multipacting process, give rise to an electron cloud. The incident cloud electrons heat the beam screen, for which only a limited cooling capacity is available. If the beam-screen heat load exceeds the available cooling the cold superconducting magnets of the LHC arcs, surrounding the beam pipe, will quench; i.e., they lose their superconducting state. Thereby, the electron cloud may limit the maximum permissible beam current of the LHC. The expected heat load does not only depend on the beam current, but also on the bunch spacing, bunch intensity and the time dependent surface properties of the beam screen. The principal sources of primary electrons are the ionization of the residual gas at injection energy and photoemission at higher energies. Thermal resistance, R =6k/W Thermal capacitance, C = 1200 J/K Main issue: temperature increase close to the quench level trigger!

27 UC1: Compensation of e- cloud thermal effect
Feed-Forward loops to compensate electron cloud heat load Compensation due to Feed Forward loops Currently used in production for Cryo: Keep temperature away from the quench level trigger Data analytics techniques to reduce computing time from weeks to hours! Cloud computing to parallelize and distribute Qdbs= heat load on the beam screen Qsr = synchrotron radiations Qic =image current Qec= electron clouds

28 UC6: Leak detection in Cooling and ventilation systems
Problem: Manually set alarms thresholds Changing filling conditions Anomaly detection based on historical data Detection of “large” leaks: Anomalous valve opening time Detection of “small” leaks: Anomalous frequency of valve openings Achievements: Identification of anomalous behaviours Improving thresholds setting Distribution of valve openings [FSED_001_VMA400]


Download ppt "Optimising CERN systems through ML & DA using controls data"

Similar presentations


Ads by Google