Presentation is loading. Please wait.

Presentation is loading. Please wait.

STANDARD WR STRAW-MAN ARCHITECTURES FOR PHASE II & WRS RELIABILITY

Similar presentations


Presentation on theme: "STANDARD WR STRAW-MAN ARCHITECTURES FOR PHASE II & WRS RELIABILITY"— Presentation transcript:

1 STANDARD WR STRAW-MAN ARCHITECTURES FOR PHASE II & WRS RELIABILITY
Diego Real Máñez Time Calibration Workshop (IFIC) 1

2 Phase II- White Rabbit Switch on DU Base (BASIC)
Time (-) and data (-) together ON- SHORE STATION 2 x WRS (Cross configured) OFF-SHORE STATION Total DU Base 18 ports DOMs + 2 ports (timing + data: 4 colours) + 1 CLB = 21 Total ports available: 2 x 18 = 36 Time Calibration Workshop 2

3 Phase II- White Rabbit Switch (REDUNDANCY)
Time and data together ON-SHORE STATION Under study: Possibility to use redundancy with simple splitter. One CLB connected to two ports. Use MDIO Control Register that could disable the tx of the port. (8-7 ports available for redundancy. In total 15 DOMs) OFF-SHORE STATION Total DUBase ports= 18 ports DOMs + 2 ports (timing + data: 4 colours) + 1 CLB = 21 Total ports available: 2 x 18 = 36. For redundancy = 15 Cover 83 % Time Calibration Workshop 3

4 Phase II- White Rabbit Switch (TIMING SEPARATED)
Timing (To WR Switch on the on-shore station) Data (To non-WR Switch on the on-shore station …. Timing (To WR Switch on the on-shore station) Data (To non-WR Switch on the on-shore station …. Total DU Base 18 ports DOMs + 2 ports timing + 2 ports data + 1 CLB (8 colours, difficult to achieve )= 23 Free for redundancy: 13 ports ready for redundancy Cover 72% Time Calibration Workshop 4

5 Phase II- White Rabbit Switch
PORT FAN CARRIER AC/DC power supply SCB 18 operational ports. SCB (20 ports routed ports but only available 18 ports) Fan, AC/DC, and mechanics not needed for KM3NeT (own mechanics and cooling) A carrier of small form factor with 20 ports being developed for CHROMIUM project (It can be used in KM3NeT) Time Calibration Workshop 5

6 Phase II- WRS RELIABILITY - MTBF
Informal data from the WR list: 1.- Failure of an operational switch due to the power supply failure 2.- GSI they experienced some problems with fans Fans are problematic since most generic types have an MTBF of 3 to 5 years at room temperature and much less at elevated temperatures. 2,1: power supply failure, a couple of times and fans failure, the latter failure in at least 9 switches. These switches were working in racks in stable room temperature (21 Celsius). 3.- have problems with an very old switch (3.3 with a small FPGA) also due to the power supply. 4.- At GSI and at Nikhef, they got EEPROMs corrupted on some SFPs. It was not discovered whether the problem was caused by the switch or SFP itself. WR Hardware for KM3NeT: SCB + CARRIER → Qualification: FIDES + HALT / HASS Time Calibration Workshop 6

7 RELIABILITY - FIDES FIDES FIDES is based on reliability engineering
Latest updated handbook (based on more 500 billions hours functioning data) Consider environment, quality factors and processes Less pessimistic (but be careful to not be optimistic!): Provides theoretical FIT (or MTBF) & Weak points for improvement Time Calibration Workshop 7

8 Reliability: HALT Highly Accelerated Life Test: Design test used to improve the robustness/reliability of a product through test-fail-fix process where applied stresses are beyond the specified operating limits. Performing HALT HALT testing is normally performed in a HALT Environmental chamber, a chamber that can simultaneously provide temperature control and vibration to the device under test. It must be possible to apply incremental increases (and decreases) in temperature and vibration to levels in excess of those specified for normal product operation. During testing, it is essential to exercise product operation and ensure functionality. Test setups should be optimized to maximize functional test coverage. The test setup should also allow for remote operation of the test and product from outside of the environmental chamber. Time Calibration Workshop 8

9 Reliability: HALT Time Calibration Workshop 9

10 Reliability: HALT Time Calibration Workshop 10

11 Reliability: HALT Time Calibration Workshop 11

12 Reliability: HALT Time Calibration Workshop 12

13 Reliability: HALT Considerations (Stress Application Ordering):
The ordering in which stresses are applied is governed by their likelihood of precipitating catastrophic failures. The following order is recommended: ƒ Decreasing temperature (Phase I to CLB+PB) ƒ Increasing temperature (Phase I to CLB+PB) ƒ Increasing vibration ƒ Minimum Sample Size - Multiple samples (at least 2 units) Time Calibration Workshop 13

14 Reliability: HALT Example of temperature profile
Time Calibration Workshop 14

15 Reliability: HALT Testing Beyond the Failure Point
The HALT test should not stop when a failure is encountered. If possible, the failure mode should be analyzed and fixed to allow the test to continue beyond the stress level at which the failure occurred. If a fix is not possible then testing should allow for and accommodate the known failure mode during further testing. Time Calibration Workshop 15

16 Reliability: HALT Recording Failures:
For each failure identified during testing, the following information should be recorded • Failure point • Failure description • Root cause of failure mode • Type of failure (catastrophic or recoverable) • Class of failure (generic or non-generic) Time Calibration Workshop 16

17 Reliability: HALT / HASS in KM3NeT
HALT for KM3NeT electronics Highly Accelerated Life Test Find a HALT (climatic) chamber and apply to any board (WRS, WRS CARRIER, and other boards KM3NeT) the HALT procedure To define a document with the HALT procedure for the KM3NeT electronics. HASS: Infant Mortality removal Highly Accelerated Stress Screening Procedure already defined for KM3NeT KM3NeT_ELEC_PRR_2014_003_Burn_In_Test.docx Time Calibration Workshop 17

18 Reliability: HALT for WRS: First draft
SCB board HALT, together with the carrier (only temperature) Time Calibration Workshop 18

19 Reliability: HALT for WRS: First proposal
SCB board HALT, together with the carrier – starting with the WRS carrier and to continue with the CROMIUM carrier (only temperature) ports on, 9 CLBs or similar and 1 port up link (rate around 200 Mbits) 2.- Measure power consumption 3.- Connect to the configuration port of the SCB 4.- Starting at 25ºC, increases of 5ºC per step (10 degrees per minute or the maximum of the climatic chamber), 5.- After each step check everything is working as expected 6.- Perform a reset of the SCB 7.- Recheck everything 8.- Continue increasing until failure 9.- If possible to fix failure and continue increasing temperature 4bis.- Starting at 25ºC decrease 10ºC per step, 10 minutes Ordering 3 SCB & carriers for starting the tests. Time Calibration Workshop 19

20 Phase II- DELL Switch DELL has provided partial information about the reliability of the boards DELL also would be able to lend a S3124F for a month To be sent it to Granada for tests Time Calibration Workshop 20


Download ppt "STANDARD WR STRAW-MAN ARCHITECTURES FOR PHASE II & WRS RELIABILITY"

Similar presentations


Ads by Google