LBDS TSU & AS-I failure report (Sept. 2016)

Slides:



Advertisements
Similar presentations
/// MELSEC Safety /// QS001CPU /// QS0J61BT12 /// QS0J65BTB2-12DT /// MELSEC Safety /// Mitsubishi Electric - MELSEC Safety - Training Documentation -
Advertisements

Fault-Tolerant Delay-Insensitive Inter-Chip Communication Yebin Shi Apt Group The University of Manchester.
Microprocessor Motor Control Spring Introduction  Stamp projects Robots  Sensors  Motor control  Logic Rocketry  Reading acceleration (“g”
Module 20 Troubleshooting Common SQL Server 2008 R2 Administrative Issues.
LHC UPS Systems and Configurations: Changes during the LS1 V. Chareyre / EN-EL LHC Beam Operation Committee 11 February 2014 EDMS No /02/2014.
The LAr ROD Project and Online Activities Arno Straessner and Alain, Daniel, Annie, Manuel, Imma, Eric, Jean-Pierre,... Journée de réflexion du DPNC Centre.
Failure mode impact studies and LV system commissioning tests
Understanding Network Failures in Data Centers: Measurement, Analysis and Implications Phillipa Gill University of Toronto Navendu Jain & Nachiappan Nagappan.
Vegard Joa Moseng BI - BL Student meeting Reliability analysis summary for the BLEDP.
RELIABILITY & SAFETY ANALYSIS PRESENTED BY: ANDREW BATEK Team # 15: Acoustic Storm Interweaving the impressive visual power of electricity and the visceral.
Technical review on UPS power distribution of the LHC Beam Dumping System (LBDS) Anastasia PATSOULI TE-ABT-EC Proposals for LBDS Powering Improvement 1.
The Architecture, Design and Realisation of the LHC Beam Interlock System Machine Protection Review – 12 th April 2005.
PROFIBUS wiring/installation can be done with:
LHC Beam Dump System Technical Audit Trigger Synchronisation Unit.
Synchronous Device Interface at NSLS-II Yuke Tian Control Group, NSLS-II, BNL (May 1, 2009 EPICS Collaboration Meeting, Vancouver)
UPS network perturbations in SX2 Vincent Chareyre EN-EL-SN ALICE Technical Coordination Meeting 7 May 2010.
TS 1.1 Basic Digital Troubleshooting 1 ©Paul Godin Updated August 2013 gmail.com.
41 st LSC – March 21, 2014 G.J. Coelingh, TE-MPE-EE Long Shutdown 1 Status Report The upgrade of 600 A Energy Extraction Systems.
Status of NA62 straw electronics and services Peter LICHARD, Johan Morant, Vito PALLADINO.
Beam Interlock System PR b-CTM, October 7th, 2010 Cesar Torcato de Matos.
Router Fundamentals PJC CCNA Semester 2 Ver. 3.0 by William Kelly.
The LBDS trigger and re-trigger schemes Technical Review on UPS power distribution of the LHC Beam Dumping System (LBDS) A. Antoine.
B. Todd AB/CO/MI BIS Audit 18 th September 2006 Signal Integrity Electro-Magnetic Compatibility Dependability.
1 Fault-Tolerant Computing Systems #1 Introduction Pattara Leelaprute Computer Engineering Department Kasetsart University
Overview of the main events related to TS equipment during 2007 Definition Number and category of the events Events and measures taken for each machine.
ABT Maintenance Management Maintenance Management ABTEF 16/05/204 T. Fowler.
GAN: remote operation of accelerator diagnosis systems Matthias Werner, DESY MDI.
TRIGGER DELAY 100µs. G. Gräwer AB/BT/ECLBDS Trigger Delay2 The trigger delay is a back-up system that generates an asynchronous dump trigger for MKD and.
CS203 – Advanced Computer Architecture Dependability & Reliability.
1 Copyright by PROFIBUS Center Nederland
Beam Interlock System Dependability Study RSWG – 31 st January 2005.
5-year operation experience with the 1.8 K refrigeration units of the LHC cryogenic system G. Ferlin, CERN Technology department Presented at CEC 2015.
BIS main electronic modules - Oriented Linac4 - Stéphane Gabourin TE/MPE-EP Workshop on Beam Interlock Systems Jan 2015.
PLC based Interlock Workshop CIS Team February 2016 ITER Central Interlock System Fast Interlock Controller.
V4.
Do-more Technical Training
ABT-EC EYETS Activities 2nd reports
Outcome of BI.DIS Fast Interlocks Peer Review
Data providers Volume & Type of Analysis Kickers
Situation of the Static Var Compensators at CERN.
Technical Services: Unavailability Root Causes, Strategy and Limitations Data and presentation in collaboration with Ronan LEDRU and Luigi SERIO.
Dependability Requirements of the LBDS and their Design Implications
Realising the SMP 1. Safe Machine Parameters Overview
Injectors BLM system: PS Ring installation at EYETS
RELIABILITY OF 600 A ENERGY EXTRACTION SYSTEMS
LHC Beam Dumping System Reliability Run Summary
EC Activities Status for the LIU
FAULT TOLERANCE TECHNIQUE USED IN SEAWOLF SUBMARINE
Fault Tolerance & Reliability CDA 5140 Spring 2006
LV Safe Powering from UPS to Clients
1v0.
PS wire scanner failures
the CERN Electrical network protection system
Commissioning and Testing the LHC Beam Interlock System
Innovating the Way You Automate!
Maintaining Data Integrity in Programmable Logic in Atmospheric Environments through Error Detection Joel Seely Technical Marketing Manager Military &
trigger and other general LBDS electronics
Internet-of-Things (IoT)
Radiation- and Magnet field- Tolerant Power Supply System
The simple system solution
Scope Reliability What is reliability ? Reliability parameters.
Combiner functionalities
M. Zerlauth, I. Romera 0v1.
Technical Stop #4 29th August - 2nd September 2011
BCM-BIS Interface Szandra Kövecses
Series 5300 Lithium Cell Formation System
PSS0 Configuration Management,
Operation of Target Safety System (TSS)
Mikael Olsson Control Engineer
LIU BWS Firmware status
Presentation transcript:

LBDS TSU & AS-I failure report (Sept. 2016) A. Antoine LBDS TSU & AS-I failure report (Sept. 2016) 27 September 2016 LBDS: TSU & AS-i Status

Content TSU AS-I Conclusion Operation History Failure Impact Failure Analysis AS-I Specifications & Framework LBDS Configuration Conclusion 27 September 2016 LBDS: TSU & AS-i Status

TSU 27 September 2016 LBDS: TSU & AS-i Status

TSU Version 1 - prototype never been in operation Operation History Version 1 - prototype never been in operation Version 2 - in operation from LHC start up to LS1 First operational experience No critical hardware failure Poor diagnosis capability SPS compatibility required (new request) Potential major failure detected (internal review) Version 3 – in operation from LS1 Critical hardware failure on 1st July 2016 Synchronous dump done LBDS B1 – TSU-B replaced 27 September 2016 LBDS: TSU & AS-i Status

TSU LBDS worst case failure ! Thanks to redundancy fail-safe design: Failure Impact LBDS worst case failure ! Thanks to redundancy fail-safe design: Synchronous dump done Operation: Expert investigation needed MTTR: ~ 1 hour 5 hours of downtime (LHC access required !) Cost: Materials: ~ 2500 CHF / intervention Expert & On call service: ~ 500 CHF 27 September 2016 LBDS: TSU & AS-i Status

TSU FPGA fatal error (not recoverable) Power supplies suspected Failure Analysis (1st July) FPGA fatal error (not recoverable) Power supplies suspected 3 dependent + 2 independent power supplies on a TSU board: +1.2V -> FPGA core +1.8V -> EEPROM (Flash Rom for FPGA) +2.5V -> FPGA & CPLD +3.3V -> most of components, FPGA interface included +5V -> CIBO powering 27 September 2016 LBDS: TSU & AS-i Status

TSU Failure Analysis: abnormal startup ~ +3V ~ +1.8V +1.2V +1.8V +2.5V 27 September 2016 LBDS: TSU & AS-i Status

TSU Failure Analyse: normal startup (FPGA removed) +1.2V +1.8V +2.5V 27 September 2016 LBDS: TSU & AS-i Status

TSU Failure Diagnosis An internal FPGA failure induce a short circuit on the +1.2V power supply Design review with N. Magnin: +1.2V power supply very noisy Noise with transients above FPGA specifications Some decoupling capacitors missing on the +5V power supply used to generate the +1.2V Still not clear why FPGA create a short circuit ! 27 September 2016 LBDS: TSU & AS-i Status

TSU Failure Diagnosis: Power Supplies Noise ~250mV ~250mV +1.2V +1.8V 27 September 2016 LBDS: TSU & AS-i Status

TSU Failure Diagnosis: Power Supplies Noise + 5V from VME is the source of all power supplies … +5V 27 September 2016 LBDS: TSU & AS-i Status

Conclusion (TSU) 1 critical failure in 10 years of operation MTTR of 5 Hours Redundant TSU strategy worked fine: Detection of the failure Synchronous Dump done Corrective action to be validated and deployed to remove noise on the +5V and 1.2V power supply 27 September 2016 LBDS: TSU & AS-i Status

AS-i 27 September 2016 LBDS: TSU & AS-i Status

AS-i Acuator-Sensor Interface Specifications: Framework: CEI 62026-2 and EN 50295 Standards Data on power line (decoupling filter) 8 bits data serial bus with Safety capability (SIL3) Up to 62 standard nodes or 31 safety nodes Reaction time <10ms Up to 100m length (300m with repeater) Framework: 1x AS-I master controller 1x dedicated power supply Unshielded 2-wires cable wrapped with an electrical insulator for data and power Actuators & Sensors Safety monitor (when needed) 27 September 2016 LBDS: TSU & AS-i Status

AS-i LBDS Configuration 27 September 2016 LBDS: TSU & AS-i Status

AS-i 2 hardware failures in 10 years of operation Operation history 2 hardware failures in 10 years of operation Same failure signature … but one was the AS-i F Link module All 4 systems impacted (beam 1 & 2) First occurrence shortly before LS1 (6 years of operation) Curative maintenance (on call service) Early LS1, preventive maintenance done with replacement of AS-I F Link & Power supply components. Second occurrence some weeks ago on 3 systems Preventive maintenance during TS3 2016 done with replacement of all AS-I Power supplies. 27 September 2016 LBDS: TSU & AS-i Status

AS-i LBDS abruptly stopped (as an AUE) Failure Impact LBDS abruptly stopped (as an AUE) AS-I worst case failure (Power and discharging switches switched off) Synchronous dump (thanks to fail-safe design) Operation: Short MTTR: 45 min 4h of downtime / intervention (access to the LHC needed !) Cost: Materials: ~ 1000 CHF / intervention On call service: ~ 300 CHF 27 September 2016 LBDS: TSU & AS-i Status

AS-i Failure Diagnosis 2 components identified as potential responsible of the AS-I failure: AS-I Master controller (AS-I F Link) AS-I Power supply Master controller: Controller down and not resettable ! No software diagnosis available Power Supply: Output filter showed degradation (capacitors) Out of specification connection of the AS-I bus (spring terminal -> no pod on wire allowed !) 27 September 2016 LBDS: TSU & AS-i Status

AS-i Scenario 1: Scenario 2: Failure Diagnosis Data on the AS-I bus are altered by the degradation of the capacitor of the power supply output filter The AS-I Master controller get wrong reply messages from safety sensors (Data corruption) The AS-I Master controller goes to safe state with failure (not resettable) Scenario 2: Bad connections (use of pod on spring teminals) Data corruption 27 September 2016 LBDS: TSU & AS-i Status

AS-i Done on all systems (4x) New AS-I Power supply Corrective action during TS3 Done on all systems (4x) New AS-I Power supply Remove all pods on wires connected with spring terminals 27 September 2016 LBDS: TSU & AS-i Status

Conclusion (AS-i) 2 periods of failures in 10 years MTTR short but MTBF increase after one occurrence (burst behavior) Fail-safe design: Synchronous Dump done Corrective action during TS3: Replacement of all AS-I Power supply Remove wire pods on spring terminals 27 September 2016 LBDS: TSU & AS-i Status