Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Reliability and Availability of the Large Hadron Collider (LHC) MachineProtection System Jan Uythoven CERN, Geneva, Switzerland Thanks to R. Schmidt,

Similar presentations


Presentation on theme: "1 Reliability and Availability of the Large Hadron Collider (LHC) MachineProtection System Jan Uythoven CERN, Geneva, Switzerland Thanks to R. Schmidt,"— Presentation transcript:

1 1 Reliability and Availability of the Large Hadron Collider (LHC) MachineProtection System Jan Uythoven CERN, Geneva, Switzerland Thanks to R. Schmidt, B. Goddard, R. Filippini* and the many other colleagues working on the LHC Protection System *Presently at PSI, Zürich

2 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 2 The Large Hadron Collider (LHC) at CERN - Geneva  The world largest particle accelerator with a circumference of 27 km  1232 Superconducting dipole magnets operating at 1.9 K  Operation with beam foreseen for 2008

3 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 3 LHC Layout

4 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 4 LHC Stored Energy  For nominal beam intensity at 7 TeV:  Energy Stored in one beam: 360 MJ  Energy Stored in the superconducting magnets: 10 GJ Energy to heat and melt one kg of copper: 700 kJ

5 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 5 Quench Protection and Energy Extraction System  when one magnet quenches, quench heaters are fired for this magnet  the current in the quenched magnet decays in about 200 ms  the current in series from the other magnets flows through the bypass diode that can stand the current for about 100-200 seconds Magnet 1Magnet 2 Power Converter Magnet 154 Magnet i

6 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 6 13 kA Energy Extraction in tunnel adjacent to accelerator Resistors absorbing the energy Switches - for switching the resistors into series with the magnets

7 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 7 Quench Protection and Energy Extraction System  8 Separate systems: one for each sector  Energies per sector similar to Hera and Tevatron accelerators  Needs to work very reliably, as damage potential is huge  Reliability studies of the system have been done  ‘Traditional’ technologies  Limited dependence on other systems This talk mainly on Protection from beam energy PhD. Thesis A.Vergara: http://documents.cern.ch/cgi- bin/setlink?base=preprint&categ=c ern&id=cern-thesis-2004-019

8 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 8 How to protect the machine from the Beam Energy ?  Machine Protection System which  Detects “any fault” in the machine:  Hardware not working properly, although fault tolerant design of safety critical systems  Effect of failures, including beam instabilities, leading to beam losses  Safely dumps the beam before it can cause any damage  Fast reaction time  Beams to be dumped within 3 turns of detection of problem = 300  s Beam Dump Block: Where the beam should go in case of any ‘problems’ detected

9 9 Systems detecting failures and LHC Beam Interlocks Little beam dependence

10 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 10 Principle of the LHC Machine protection System ‘User systems’ can detect failures and send hardwired signal to beam interlock system Range from Experimental Detectors to Vacuum Valves Each user system provides a status signal, the user permit signal. The beam interlock system combines the user permits and produces the beam permit The beam permit is a hardwired signal that is provided to the dump kicker The Beam Dumping System combines many high technology techniques Beam Interlock System LHC Dump kicker Beam ‘Permit’ User permit signals Hardware links /systems, fully redundant Many different technologies

11 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 11 Organisation for the LHC  Machine Protection includes many different hardware systems  Many different departments and groups responsible for their equipment  Coordination of machine protection by two working groups  General coordination – definition of the system  Commissioning working group – accent is on procedures to be applied  Reviews and external audits are used for obtaining external advice  General review LHC Machine Protection System  Audit of Beam Interlock Controller done  Audit of Beam Dumping System planned  Audit of Beam Loss Monitoring System requested

12 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 12 Requirements concerning Machine Protection System  Safety Assessment (‘reliability’)  IEC 61508 standard defining the different Safety Integrity Levels (SIL) ranking from SIL1 to SIL4  Based on Risk Classes = Consequence x Frequency  Machine Protection System for the LHC should be SIL3, taking definition of Protection Systems, with a probability of failure between 10 -8 and 10 -7 per hour (because of short mission times)  Catastrophy = beam should have been dumped and this did not take place; can possibly cause large damage  Availability  Definition:  Beam is dumped when it was not required  Operation can not take place because the protection system does not give the green light (is not ready)  Requirement:  Definition not according to any standard  Downtime comparable to other accelerator equipment; maximum tens of operations per year

13 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 13 Approach Adopted “Strategy”  End of ’90s: start an “Interlock Manager”, which later continued as a Machine Protection System  Until then Particle Accelerators mainly considered Equipment Protection  Since then ‘Machine Protection’ has become a common approach in high power accelerators  Dual Approach  Prevent fault at the source (= old fashioned approach) &  Detect the effect resulting from any fault, including beam instabilities, and react fast enough to prevent damage  Deployment in SPS accelerator to test concepts

14 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 14 Are the requirements fulfilled?  Reduce the Protection System to the basic elements. The other systems give an additional protection. BIC Beam Interlock Controller LBDS Beam Dumping System BLM Beam Loss Monitors PIC Power Interlock Controller QPS Quench Protection System 6 BLMs per sc quad 4000 in total

15 15 Systems detecting failures and LHC Beam Interlocks

16 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 16 Main Systems  Thorough design from the start  Based on redundancy  For each of the 5 main components of the Machine Protection System Dependability numbers (= reliability & availability) have been calculated  Basically one PhD thesis per system !  Some details for the Beam Dumping System calculations are given later  Assume operational scenario  Combination of these numbers gives the Machine Protection Dependability estimate  Shows weak links

17 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 17 Resulting Unsafety and Availability Numbers SystemUnsafety/y Probability False dumps/y Average Std.D. LBDS (OP1) 2.4  10 -7 (2x) 4(2x) +/-1.9 BIC 1.4  10 -8 0.5 +/-0.5 BLM 1.44  10 -3 (Front-end) 0.06  10 -3 (Back-end VME) 17 +/-4.0 PIC 0.5  10 -3 1.5 +/-1.2 QPS 0.4  10 -3 15.8 +/-3.9 MPS 2.3  10 -4 5.75  10 -8 /h (SIL3) 41 +/-6.0 ASSUMPTIONS Operational scenario 200 days/year of operations: 400 beam operations (10h each) followed by checks (2h). Diagnostics effectiveness LBDS and BIC “as good as new” after checks (BLM, partially) QPS and PIC “as good as new” after periodic inspection or power abort DR apportionment 60% planned dumps 15% fast beam losses 15% slow beam losses 10% others Redundancy No cross-redundancy within the Beam Loss Monitors (P = 0, worst- case)

18 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 18 Sensitivity of Safety to the Model Parameters Sensitivity to the type of dump request The fast beam losses contribute by two orders of magnitude more to the overall unsafety. 45% of fast beam losses assumed instead of 15%. Safety moves from 2.3  10 -4 /y to 6.8  10 -4 /y  SIL2 Sensitivity to the redundancy of the BLM Same dump request apportionment, but a beam loss is detectable by two monitors with a probability 0<P<1. If P moves from 0 to 1, the safety will be recovered from 6.8  10 -4 /y to 2.8  10 -5 /y  SIL 4 RESULTS on LOG scale!

19 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 19 Failure Rates of a Single Sub-System (…open brackets… SystemUnsafety/y Probability False dumps/y Average Std.D. LBDS (OP1) 2.4  10 -7 (2x) 4(2x) +/-1.9 BIC 1.4  10 -8 0.5 +/-0.5 BLM 1.44  10 -3 (Front-end) 0.06  10 -3 (Back-end VME) 17 +/-4.0 PIC 0.5  10 -3 1.5 +/-1.2 QPS 0.4  10 -3 15.8 +/-3.9 MPS 2.3  10 -4 5.75  10 -8 /h (SIL3) 41 +/-6.0 LHC Beam Dumping System

20 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200720 The LBDS LHC Beam Dumping System LBDS inventory Extraction15 Kicker Magnets + 15 generators 10 Septum Magnets + 1 power converter Dilution10 Kicker Magnets + 10 generators AbsorptionOne dump block ElectronicsBeam energy measurement (BEM) Beam energy tracking (BET) Triggering and re-triggering Post mortem diagnostics (check of every beam dump) Beam line975 m from extraction point to TDE 1) MKD The 15 kicker magnets deflect the beam horizontally 4) MKB The 10 kicker magnets dilute the beam energy 3) MSD The 15 septum magnets deflect the beam vertically 5) TDE The beam is absorbed in a graphite block 2) Q4 The quadrupole enhances the horizontal deflection The beam sweep at the front face of the TDE absorber at 450 GeV

21 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200721 The LBDS: Safety in Design Fault Tolerant Features No single point of failure should exist in the LBDS Redundancy is introduced to allow failures up to a certain threshold. Surveillance detects failures and issues a fail safe dump request. Redundancy 14 out of 15 MKD, 1 out of 2 MKD generator branches Surveillance Energy tracking, Retriggering Redundancy 1 out of 4 MKBH, 1 out of 6 MKBV Surveillance Energy tracking Surveillance Energy tracking, Fast current change monitoring Redundancy 1 out of 2 trigger generation and distribution Surveillance Synchronization tracking Surveillance TX/RX error detection Voting of inputs

22 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200722 The Modeling Framework FMECA = Failure Modes Effects and Criticalities Analysis No detailed assessment of fault consequences. Two failure modes only: Fail Safe Fail Unsafe

23 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200723 Reliability Prediction Failure rates are deduced at component level from standard literature (i.e. Military Handbook 217F). The logic expressions of the failure modes are translated into probabilities and into failure rates. Example: the failure mode F1 MKD of the MKD system: 1.Logic Expression 2 out of 15[(PT1A AND PT1B) OR (SP1A AND SP1B) OR (SC1A AND SC1B) OR (CP2A AND CP2B) OR (COS12A AND COS12B) OR (COS22A AND COS22B) OR M] 2.Probability 3.Failure rate

24 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200724 Results Failure Modes and Rates of the LBDS MKDMKD The FMECA and reliability prediction have been performed for all sub-systems in the LBDS. More than 2100 failure modes have been classified at component level. They have been arranged into 21 failure modes at system level.

25 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200725 Operation Scenarios for one Mission State Transition Diagrams Failsafe rates  FS\X k are decreasing with time Fail unsafe rates FU\X k are increasing with time STATES AvailableX0 X1 (no BETS) X2 (no RTS) X3 (no BETS, RTS) FailsafeX4 Failed unsafeX5 Compact State Based Approach

26 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200726 State Transition Diagrams The Sequence of Missions and Checks Missions are driven either by internal false dumps or by external dump requests. At checks the system is recovered to the initial state. The process starts in X = 0 of Mission 1 and stops when one year of operation is reached. The sequence of N missions and checks is a non-homogeneous Markov process of 2  N  5 states.

27 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 200727 Operational Scenario Missions of random duration alternate with 2 hours of checks, over 200 days of operations. –In addition to a false dump, the end of the mission is determined by an external dump request, which is either a planned dump request (Weibull) or a beam induced. The dump request rate is: Planned dump  =5, = 1/11 Beam induced dump  = 0.001, 0 = 0.1 Distribution of dump requests

28 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 28 … close brackets…) SystemUnsafety/y Probability False dumps/y Average Std.D. LBDS (OP1) 2.4  10 -7 (2x) 4(2x) +/-1.9 BIC 1.4  10 -8 0.5 +/-0.5 BLM 1.44  10 -3 (Front-end) 0.06  10 -3 (Back-end VME) 17 +/-4.0 PIC 0.5  10 -3 1.5 +/-1.2 QPS 0.4  10 -3 15.8 +/-3.9 MPS 2.3  10 -4 5.75  10 -8 /h (SIL3) 41 +/-6.0 LHC Beam Dumping System PhD. thesis Roberto Filippini: http://doc.cern.ch/archive/e lectronic/cern/preprints/the sis/thesis-2006-054.pdf Availability of other systems not studied, can be done if required

29 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 29 Also Analysis being done with a Different Approach  Hybrid methodology combining fault tree for component failure rates and simulations in the time domain for the complete system  Results concerning protection system reliability and beam availability  Option to disable part of a system and see the effect  Collaboration with Laboratory for Safety Analysis, ETH Zürich, Sigrid Wagner  Ongoing…

30 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 30 Key issues concerning Design of Sub-Systems  Requirements to obtain a “safe system”  No single point of failure  Redundancy of critical components  Redundancy of signal paths between (sub-)systems  Periodic checks to get back to a state which is ‘as good as new’  Failure rates of redundant systems increase in time – get back to zero (different from aging)  Surveillance of critical signals  Safe mission abort  Trade off between availability and reliability

31 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 31 Following the Design Studies and Manufacturing  Test equipment in operational environment  Quench Protection System operational during Hardware Commissioning of the LHC magnets  Reliability run starting for the Beam Dumping System with about 3 months of continuous operation  Can give upper limit of failure rate of most critical components because of redundancy  Logging and Post Mortem systems (analysis of events using logging data, and special ‘fast’ buffers triggered after a beam dump) used during Hardware Commissioning  Install similar equipment or components in operational accelerators  Beam Interlock System installed and operational in the LHC injection chain  Fast Magnet Current Change Monitor already operational  Energy tracking system of the LHC beam dump working for the extraction system of the SPS injector

32 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 32 General Test Procedures  Before operation with beam:  Thorough testing required of all installed equipment  Definition and follow-up of test procedures for the individual equipment  Machine Protection System Commissioning Working Group which approves the test procedures  Tests with beam required  Define tests before going into a next beam commissioning phase  Example: Provoke a quench of a magnet and check Beam Loss Monitoring signals  Measure delays between detection and actual beam dump  Safe beam flag to allow masking of some interlock channels in case of low intensity / low energy beams How to enforce these tests ?

33 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 33 Lessons Learned from the exercise  Absolute failure rate levels depend largely on model assumptions, but do indicate the weak links in the system  Confidence in relative numbers and sensitivity effects  Hardware of some systems was adapted to obtain reliability numbers similar to the other systems  Add redundancy  Periodic testing, sometimes several times per day, will contribute to the safety of the system  Test the presence of the assumed redundancy

34 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 34 Human Aspects Hardware Design Dependability Studies Testing of proto-types Testing of series in Laboratory Testing once installed Tests with beam Procedures Testing during production Testing after installation During Operation Confirm Redundancy Post Mortem Re-establish confidence When changing hardware When changing settings Gained experience A lot of discussions…

35 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 35 Example of human Aspects  Beam accident extracting high intensity beam in 2004 from the SPS injector by which vacuum chamber was damaged  Noise on temperature sensors induced by the beam caused magnet interlock, stopping the magnet power converter  Error in the protection logic:  Magnet power converter was stopped before inhibiting extraction  No clear procedures what to do: the experiment was continued without sorting out the problem  No clear responsibility: several people were in charge at the same time and nobody said ‘stop’  Created a lot of awareness of potential problems for the LHC

36 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 36 LHC Strategy presently under Discussion  How to change Beam Loss Monitor thresholds & masking of signals  Thousands of values – avoid errors  Are the correct when put in for the first time?  Who is allowed to do adapt the thresholds?  What will be the procedures?  The Post Mortem Analysis of the Beam Dumping System indicates a fault  What are the procedures to recover?  Who can give the ‘ok’ again?  “The same problem happened last month; after 1 day of testing we just continued. We are near the end of the physics run of this year…”  Who is in charge?  Will there be a group of ‘safety experts’ and what will be their role?

37 Jan Uythoven, CERNITER RAMI Workshop 6-7 December 2007 Page 37 Conclusions  Safety and Reliability has become an accepted topic for high power accelerators  The LHC has a coherent Machine Protection System following interdisciplinary work for almost 20 years  Producing dependability numbers is very time consuming and the result depends largely on the model assumptions  However the benefits are that  The weak links can be shown  Designs have been adapted accordingly  Awareness has been raised  On paper the numbers look good, but testing is required during installation, cold check-outs and operation with beam  Procedures during normal operation. Checks required almost continuously to confirm the redundancy of the systems  Procedures in case an abnormality is detected  Who is responsible in the control room?  Organisational issues will be important  Enforcing procedures / exceptions


Download ppt "1 Reliability and Availability of the Large Hadron Collider (LHC) MachineProtection System Jan Uythoven CERN, Geneva, Switzerland Thanks to R. Schmidt,"

Similar presentations


Ads by Google