Presentation is loading. Please wait.

Presentation is loading. Please wait.

Monitoring and Diagnostics infrastructure & SW for LHC QPS WorldFIP first analysis Michel Arruat, Frank Locci, Julien Palluel BE/CO1.

Similar presentations


Presentation on theme: "Monitoring and Diagnostics infrastructure & SW for LHC QPS WorldFIP first analysis Michel Arruat, Frank Locci, Julien Palluel BE/CO1."— Presentation transcript:

1 Monitoring and Diagnostics infrastructure & SW for LHC QPS WorldFIP first analysis Michel Arruat, Frank Locci, Julien Palluel BE/CO1

2 Summary  Our studies focused on a specific bus DL3J reported by QPS as a special bad case.  We observed also interesting results on other bus but we need more information about the faults itself.  AGENDA:  Infrastructure and configuration  Results  Overall observations  Conclusions  Solutions Michel Arruat, Frank Locci, Julien Palluel BE/CO2

3 Infrastructure and configuration Michel Arruat, Frank Locci, Julien Palluel BE/CO3 FEC Optical repeater Repeater FIPDiag BEFORE AFTER: 88 segments QPS agents Surface Tunnel RE QPS DOUBLING

4 Infrastructure and configuration Michel Arruat, Frank Locci, Julien Palluel BE/CO4 IP4 IP3

5 Infrastructure and configuration  Buses focused :  DL3J (errors on all QPS agents)  DL3K (not a lot of agents)  DL4J (same configuration as DL3J)  DL3D (parallel to DL3J)  Bus load Michel Arruat, Frank Locci, Julien Palluel BE/CO5 Segment Nb agents 1st subsegment Nb agents 2nd subsegment TotalBA ms DL3D36 100 DL3J202444100 DL3K31 100 DL4J202444100

6 Infrastructure and configuration  FIPWatcher (frame analyser) placed at the beginning of the bus: Michel Arruat, Frank Locci, Julien Palluel BE/CO6

7 Infrastructure and configuration  BA Table:  ID_DAT 3000 time variable WAIT 4ms ID_DAT 05 commands ID_DAT 00 reads ID_DAT 02 reads ID_DAT 04 reads ID_DAT 06 reads ID_DAT 0100 sync ID_DAT 067F/057F fipdiag variables APER WINDOW (presence, list presence…) SYN_WAIT 100ms  NB: If the BA take too much time, it will ignore the next trigger in order to complete his cycle and increase the cycle period. Michel Arruat, Frank Locci, Julien Palluel BE/CO7 4ms 12ms 20ms 0.20ms 0.50ms 1.4ms

8 Results Michel Arruat, Frank Locci, Julien Palluel BE/CO8 On all recorded segments, there is no error frames, no bad CRC, too long / small frames etc... ==> no electrical problem disrupting the signal. This is confirmed by the low error (almost zero, 2 out of 90 million) seen on fipdiag (diagnostic module at the end)

9 Time between last RP_DAT and first ID_DAT (3000): cycle freetime Michel Arruat, Frank Locci, Julien Palluel BE/CO9  DL3K  With presence list (ID_DAT+RP_DAT=732us+TRs): 28.5ms  Without: 29.9ms  DL3J  With presence list: 0.16ms  Without: 1.6ms  DL4J  With presence list: 1.5ms  Without: 3ms  DL3D  With presence list: 17.6ms  Without: 19ms 1.4ms

10 Time between 2 same ID_DAT >100ms (strictly, it includes the jitter) Michel Arruat, Frank Locci, Julien Palluel BE/CO10  DL3K: 9.8%  DL3J: 100% (2 cycles cumulated delay on 1200 cycles)  DL4J: 2.6%  DL3D: 1% DL3K TimeDTFree time 0.666950.09987-0.00013ID_DAT(0567) CRC : AFA7 0.766820.09987-0.00013ID_DAT(0567) CRC : AFA7 0.866680.09986-0.00014ID_DAT(0567) CRC : AFA7 0.966690.1000111E-05ID_DAT(0567) CRC : AFA7 1.0665560.099866-0.00013ID_DAT(0567) CRC : AFA7 1.1664190.099863-0.00014ID_DAT(0567) CRC : AFA7 1.2662870.099868-0.00013ID_DAT(0567) CRC : AFA7 1.3661540.099867-0.00013ID_DAT(0567) CRC : AFA7 DL3J TimeDTFree time 42.093550.1001910.00019ID_DAT(05AE) CRC : 3BE7 42.193760.1002110.00021ID_DAT(05AE) CRC : 3BE7 42.294070.1003110.00031ID_DAT(05AE) CRC : 3BE7 42.394280.1002110.00021ID_DAT(05AE) CRC : 3BE7 42.494460.1001810.00018ID_DAT(05AE) CRC : 3BE7 42.594630.1001710.00017ID_DAT(05AE) CRC : 3BE7 42.694820.1001910.00019ID_DAT(05AE) CRC : 3BE7 42.7950.1001810.00018ID_DAT(05AE) CRC : 3BE7

11 Interframes (TR + cable time): total of all interframes from agents produced variables during 1 cycle Michel Arruat, Frank Locci, Julien Palluel BE/CO11  DL3K: 9648 µs  DL3J: 14593.2 µs  DL4J: 12857.2 µs  DL3D: 11471.2 µs 1.7ms

12 Overall observations Michel Arruat, Frank Locci, Julien Palluel BE/CO12  Enough time to execute read/write during the callback before next cycle? : DL4J : ID_DAT 0100 to ID_DAT 3000 : 3.6ms ID_DAT 3000 to First ID_DAT 05 : 4ms  It can leads to incorrect datas because of trying to access a data that is already refreshed by the next cycle. Need to calculate how long is the callback  Every buses has a lot of bad MPS_STATUS (=04) on read variables, so refreshment is NOT_OK  Every buses has some agents answering FFFFFFFFFF….FFF values. Strange?  Optimization if possible: using a single variable of 96 bytes is more efficient than 4 variables of 24 bytes ID_DAT 3000 time variable WAIT 4ms ID_DAT 05 commands ID_DAT 00 reads ID_DAT 02 reads ID_DAT 04 reads ID_DAT 06 reads ID_DAT 0100 sync ID_DAT 067F/057F APER WINDOW SYN_WAIT 100ms

13 Conclusions Michel Arruat, Frank Locci, Julien Palluel BE/CO13  DL3J bus is the longest bus. We see the effect on the interframes, longer than DL4J with the same number of agents. In addition it has the maximum of agents, the result is the BA is overloaded and may get out of the 100ms (free running).  Some buses have not a lot of free time. Enough to execute the read/write on the card ?

14 Solutions Michel Arruat, Frank Locci, Julien Palluel BE/CO14  Up the BA to 120ms for instance  Move some agents from DL3J to DL3D but risk to saturate DL3D  Remove diagnostic part but it is bad (maybe not sufficient)  Reduce the TR to lower value than 70µs, ex: 33µs. Possible?  Use only one variable of 96 bytes. Possible ?  Explain/understand agents refreshment 04 MPS status and FFFFF….FF datas  Id_dat order, sometime in layout order, sometime not  To give more time to the callback to compute datas, use 2 callbacks with 2 dummy sync var in order to make 2 groups of id_dat.


Download ppt "Monitoring and Diagnostics infrastructure & SW for LHC QPS WorldFIP first analysis Michel Arruat, Frank Locci, Julien Palluel BE/CO1."

Similar presentations


Ads by Google