Presentation is loading. Please wait.

Presentation is loading. Please wait.

Calorimeter Data Monitoring News Benoit Viaud (LAL-in2p3) B. Viaud, Calo Mtg Aug. 31 st 2011 0.

Similar presentations


Presentation on theme: "Calorimeter Data Monitoring News Benoit Viaud (LAL-in2p3) B. Viaud, Calo Mtg Aug. 31 st 2011 0."— Presentation transcript:

1 Calorimeter Data Monitoring News Benoit Viaud (LAL-in2p3) B. Viaud, Calo Mtg Aug. 31 st 2011 0

2 Overview Reminder: too many alarms make the monitoring inefficient; A survey of the Monitors' behavior over 2011; A few proposed improvements. B. Viaud, Calo Mtg Aug. 31 st 2011 1

3 Reminder Marie-Noelle (early May 2011) : there are too many alarms issued by the monitoring. Not all have real consequences. this brings Data Quality shifters vigilence down: they eventually overlook important issues. I surveyed 2011 monitoring data to determine what alarms are indeed to noisy and see what can be done. B. Viaud, Calo Mtg Aug. 31 st 2011 2

4 Survey of the monitors over 2011 Most of the Monitoring is based on those monitors: + A few others based on collision data. 3

5 Survey of the monitors over 2011 Most of the Monitoring is based on those monitors: Quantities like PMT's answer to a LED pulse, pedestal position, etc... measured in each cell:it's faulty if the average over n events is outside a certain range. The number of faulty cells determines the severity of the conclusion: warning/alarm/fatal This monitoring is repeated every 10-15 minutes. 4

6 Survey of the monitors over 2011 I analyzed all the 15-minute savesets taken in 2011 (up to Aug. 13th, only physics fills, discard those created automatically at the end of a run); The goal is to count the number of warnings and alarms issued by each monitor, per unit of time (fill): spot those which "overwhelm" the DM. Action to be taken: to be discussed with the corresponding experts (re- tune the ranges and thresholds to reduce the nb of alarms while keeping the calo safe) Correlations are expected among the monitors: confirm them in practice. Correlated monitors can be grouped into a single item to simplify the DM's work. Scripts developped for this study can easily determine the effect of thresholds' variation. 5

7 Example: Ecal_Unexpected Signal NB: All the other monitors shown in back-up # of Savesets in the fill # of Savesets at least in Warning # of Savesets at least in Alarm # of Savesets in Fatal. Fill Number # Savesets 6

8 Example: Ecal_Unexpected Signal Normalized to the number of fills in the Saveset Fill Number # Savesets Fill 1806: 25-05-2011 Fill 1613: 13-03-2011 Fill 1944: 14-07-2011 Fill 2025: 13-08-2011 7

9 Correlated Monitors PedestalChi2 & PedestalAverageNoise alarms always accompanied by a PedestalNoise alarm; Most of the PedestalNoise & PedestalShiftOverNoise alarms accompanied by a PedestalShift alarm

10 Ecal_AveragePedestalNoise

11 Ecal_PedestalChi2 10

12 Ecal_PedestalNoise

13 Ecal_PedestalShiftOverNoise

14 Ecal_PedestalShift 13

15 Hcal_AveragePedestalNoise

16 Hcal_PedestalChi2

17 Hcal_PedestalNoise 16

18 Hcal_PedestalShift

19 Correlated Monitors PedestalChi2 & PedestalAverageNoise alarms always accompanied by a PedestalNoise alarm; Most of the PedestalNoise & PedestalShiftOverNoise alarms accompanied by a PedestalShift alarm  Group them into a single Pedestal alarm in the DM page. Keep the full picture in the Piquet page for finer diagnostics.

20 Correlated Monitors LEDNoise & LargeLEDNoise LowLEDSignal & OutRangeLED & NoGainMonitor 19

21 Ecal_LEDNoise

22 Ecal_LargeLEDNoise

23 Hcal_LEDNoise 22

24 Hcal_LargeLEDNoise

25 Ecal_LowLEDSignal

26 Ecal_OutRangeLED 25

27 Ecal_NoGainMonitor

28 Hcal_LowLEDSignal

29 Hcal_OutRangeLED 28

30 Hcal_NoGainMonitor

31 Correlated Monitors LEDNoise & LargeLEDNoise LowLEDSignal & OutRangeLED & NoGainMonitor  Group them into a single LEDNoise and a single NoGainMonitor

32 Even vs. Odd in Prs/Spd  Group Odd and Even in DM plots. 31

33 Replace this: Proposal

34 By this : Quite simpler for the DM.

35 Noisy Monitors Now: study the pattern behind those alarms + discussions with experts to make them quieter and safe (ex: optimized ranges and thresholds). Next slides contain my first remarks.  Summing up all the alarms: something pretty much everyday Those which issue a Warning/Alarm/Fatal at least every few days; There are a few of them (see next slides); 34

36 Noisy Monitors

37

38 37

39

40 Noisy Monitors Some alarms appear simultaneously in many monitors ; Happens when something a bit dramatic occurred (at leat something at all must have happened) ; I guess we want those alarms; we should see to it that they’re still there after monitoring ranges/thresholds have been optimized. Ex: Fills 1738, 1743, 1944 HCAL_LEDNoise ECAL_LEDNoise

41 Noisy Monitors Some alarms appear simultaneously in many monitors ; Happens when something a bit dramatic occurred (at leat something at all must have happened) ; I guess we want those alarms; we should see to it that they’re still there after monitoring ranges/thresholds have been optimized. Fill 1944: right after LHCb restarted on July 14 th, shortly after a power cut. Fill 1743: mis-Configuration of ODIN, LED pulsing in a physics BXID. 40

42 Noisy Monitors: Spd Fake Signal I observe one faulty saveset every few hours, everything’s OK 15 minutes before/after. Instability in the pedestal ? Most of the times, not very much above the Warning threshold.

43 Noisy Monitors: Prs PedestalMeans Shows up after LHCb restarted on July 14 th + power cut. FEB11 on crate 2 changed by Stephane ?

44 Noisy Monitors: Ecal/Prs Low Occupancy Other alarms are simultaneous for Ecal and Prs. Always (save one time) due to the very first saveset analyzed in the fill, typically 1 to 5 minutes after the start. PS2FEB11 is visible on the left of the PRS plot. Do they really appeared in the alarm section of the presenter ? If yes, discarding the first saveset will reduce a lot their rate. 43

45 Noisy Monitors Known for long. Find something to fix it…

46 Noisy Monitors Known for long. Find something to fix it… 1799, 21/5/11 2040, 22/8/11

47 Summary and Prospects Surveyed 2011 monitoring data to find ways to reduce the number of alarms to be handled by the Shift Data Manager; Many alarms correlated/simultaneous: could group them into a single one; will require a bit of coding (create new monitoring histos): one of my next steps. A few monitors trigger an alarm every few days; combining everything, it means something almost every day. I’m presently having a look at that to determine if this can be re-optimized (less alarms and still safe). Scripts written for this study can be made available to the Piquets (after a bit of cleaning). Could be used every day to monitor in their whole the fills taken in the past 24 hours. 45

48 Back-up

49

50

51

52

53

54

55

56

57

58

59

60

61

62

63

64

65

66

67

68

69

70

71

72

73

74

75

76

77

78

79

80

81

82

83

84

85

86

87

88

89

90

91

92

93

94

95

96

97

98

99

100

101

102

103

104

105

106

107

108

109

110


Download ppt "Calorimeter Data Monitoring News Benoit Viaud (LAL-in2p3) B. Viaud, Calo Mtg Aug. 31 st 2011 0."

Similar presentations


Ads by Google