Presentation is loading. Please wait.

Presentation is loading. Please wait.

Bad Runs Alberto Oliva, July 2012.

Similar presentations


Presentation on theme: "Bad Runs Alberto Oliva, July 2012."— Presentation transcript:

1 Bad Runs Alberto Oliva, July 2012

2 Sample Sample: B584/pass3 Runs: 22804 (26220 files)
Time span: 20/05/2011 ( ) – 17/05/2012 ( ) ~ 1 year Processed runs: 22802 Errors in processing: 2 ( , ) B584 has new tools for the study of bad runs provided with the new production: A. Kounine (xDR error handling) M. Duranti (DSP errors) V. Choutko (DaqEvent storing) This analysis has been performed on the SEU cluster. Full analysis of all the runs in less than 1 day.

3 Day-by-day Acquisition Time
24h Power reconfig (14/Nov/2011) JMDC Hang (25/Nov/2011) RDR LV Alarm (24/Dec/2011) TPD Problem (2/Dec/2011) Nothing in e-log, maybe missing runs (?) (Aug/2011) ECAL Desync (15-22/Dec/2011) Commissioning ~ 2 weeks (TOF HV scans, TTCS test, Trigger tests, …) Ended with trigger finalization, 5/Jun/2011 ( ) 346 working days (after comm.) for 334 acquisition days “Operations Efficiency” ~ 96.3%

4 Bad Runs Lists DAQ: a list of known bad runs from the general point of view provided by M. Paniccia > Pulser Left ON: 17 runs, ~ 5 hours ECAL: a detailed list came from M. Paniccia > ECAL crate E0 event mismatch (Dec 2011): 491 runs, ~ 7 days > EDR-1-0-A wrong configuration file (Jul 2011): 14 runs, ~ 4 hours > EIB RP3 (EDR-1-3 connector 0) power cycle (Sep 2011): 1 run, ~ 3 min > EIB RP3 (EDR-1-3 connector 0) wrong trigger settings (Sep 2011): 23 runs, ~ 8 h > Ecal Trigger Test (Nov 2011): 1 run, ~ 1 hours Tracker: Jose provide a list of bad runs > Bad Calibration (done in SAA or polar regions): 135 runs, ~ 1.8 days TRD: a list will be provided soon TOF: from a list of TOF problems compiled by V. Bindi no runs decide to be tagged as bad. RICH: an estimator of the run quality will be provided. For the moment no bad runs from RICH.

5 Run Tag Cutting on RunTag will cut away all the refill periods
ECAL Tags TRD Refills Cutting on RunTag will cut away all the refill periods that rougthly account for 1179 runs, ~ 17 days. No cut right now: we will use TRD list. TOF scans JMDC Reboot (defult RunTag)

6 Event Synchronization
Synchronization error occurs whenever part of the event comes from different triggers. Most (not all) of the synchronization problem are detected during event building. > A. Kounine provided code to check for this problem in the offline. No ECAL Desync and No Commissioning ECAL Desync (15-22/Dec/2011) only few desync during run (?) Runs with at least one desync error: 635 runs (~ 11 days) Big fraction comes from Dec 2011 ECAL problem (cuts superposition). Another big fraction comes from runs with a very low number of desync errors.

7 Program Memory Error Corruption of the program memory of DSP.
From time to time (rarely) causes a crazy behavior of the board. > M. Duranti developed tools needed to check the problem carefully. Node is OK Node is KO Node is OK DSP test on node  Status is OK DSP test on node  Status is KO Node Boot Runs with at least one DSP error: 4030 runs, ~ 60 days  Not used. Left as possibility for the accurate data analysis of efficiency.

8 Fraction of Missing Events
Comparison between the last event number and events on disk. > Error on frame transmission (missing, incomplete, corrupted). > Error on JMDC event transmission (corrupted format, event CRC error). > Offline production problem. 16587 runs have 0 difference! Runs with a large number of missing events 0.1% 1% 10% 100% Why this shape? Fraction of missing events > 0.1%: 421 runs (~ 6 days) We have some runs with number of events on disk > events  reprocess duplication (86 runs).

9 Fraction of Events with Error
The event collected and stored on disk could be affected by “hardware” problem (ROOM error, desync error, …). Using A. Kounine code we can check errors on all the DAQ nodes (not only upper part). Typical error rate is of the order of 0.1% Runs with a large number of events with errors 0.1% 1% 10% 100% Fraction of events with Errors > 1%: 661 runs (~ 10 days)

10 Fraction of Events with no Particles
Events with no error may not have an associated reconstructed particle. > Trigger on an interaction event (accounted in acceptance evaluation) > Bad Trigger configuration (not accounted). All coming from 7/Aug/2011 (Pulser On) 1% 10% 100% Fraction of events with No Particle < 1%: 22 runs (8 hours)

11 AMS Zenith Angle ISS moves like an “acrobat”, then AMS is not always pointing to the “Zenith”. The code for the AMS Z axis angle with respect to Zenith comes from C. Consolandi. movements during run high angle AMS “not vertical”: 74 runs (~ 1 day)

12 Conclusion Cut Standalone In Sequence Bad DAQ 17 (~ 0.2 d) Bad ECAL
Bad Tracker 135 (~ 2 d) 128 (~ 2 d) Desync 635 (~ 11 d) 394 (~ 7 d) Missing Events 421 (~ 6 d) 377 (~ 5 d) Events with DAQ Error 661 (~ 10 d) 96 (~ 1 d) Events with no Particle 22 (~ 0.2 d) 1 (~ d) « AMS Vertical » 74 (~ 1 d) 65 (~ 1 d) Total 1608 (~ 25 d) over (~ 334 d) ~ 7.5% Lists with Vitaly’s format created. TRD bad run list is missing. Probably some runs should be reprocessed. This acquisition time should be decreased by the SAA exposition (around 15%).


Download ppt "Bad Runs Alberto Oliva, July 2012."

Similar presentations


Ads by Google