Presentation is loading. Please wait.

Presentation is loading. Please wait.

FAX UPDATE 26 TH AUGUST 2013. Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation.

Similar presentations


Presentation on theme: "FAX UPDATE 26 TH AUGUST 2013. Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation."— Presentation transcript:

1 FAX UPDATE 26 TH AUGUST 2013

2 Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation dCache monitor 5.0.0 Collector Dashboard 50 shades of green Ilija Vukotic ivukotic@uchicago.edu 2 CONTENT

3 RUNNING ISSUES Dead endpoints: Frascati, Manchester, LAL cmsd services are dead at: Taiwan-lcg2, LPSC, Protvino, SWT2_CPB /atlas/dq2/user/gangarbt lookups Made half of federation endpoints not accessible from upstream redirectors. will be more explained by Johannes. Remaining issues with x509 communicating our wish to get it turned on BU, DESY-HH, DESY-ZN, FZK, LRZ-LMU, MPPMU, Freiburg, Wuppertal, Geogrid Ilija Vukotic ivukotic@uchicago.edu 3

4 RUNNING ISSUES Ilija Vukotic ivukotic@uchicago.edu 4 Rather green considering it’s August ! Quite a bit of traffic considering it’s August ! New functional HC tests should not contribute much AFAIK

5 FAX FAILOVER FAX failover works http://pandamon.cern.ch/fax/failover.http://pandamon.cern.ch/fax/failover Developments: Cloud is shown and corrected queue names Side menu In works: Filtering on user Graphing To ponder: Site admins are not aware of this possibility. How do we communicate to them that it is in their best interest to turn it on? Ilija Vukotic ivukotic@uchicago.edu 5

6 FAX FAILOVER Ilija Vukotic ivukotic@uchicago.edu 6 FAX dedicated submenu Will add here panda brokered job statistics FAX dedicated submenu Will add here panda brokered job statistics Production jobs failing over to FAX

7 MOVING TO NEW AMQ SERVER All FAX related info was sent to pilot.msg.cern.ch There was no authentication Moved to Dashboard test broker Consumer now uses STOMP+SSL Required change to new stomp version This week will move to production server Ilija Vukotic ivukotic@uchicago.edu 7

8 INFORMING ON ENDPOINT STATUS Mailing from SSB works and gives results. Do we want SAM updates too? What would it take? Who would do it? Ilija Vukotic ivukotic@uchicago.edu 8

9 MONITORING DEVELOPMENTS There is a need to remotely check if cmsd works. We had (and still have) sites showing as green for direct access and red for downstream redirection. Investigation shows that actually cmsd’s are dead/not responding. Need a way to directly probe cmsd’s Andy will look at the ways to do it. To develop new columns for SSB: xRootD version Rucio support Monitoring status Ilija Vukotic ivukotic@uchicago.edu 9

10 MONITORING VALIDATION First step is validation that results shown by Matevz’s collector are correct. I was sending xrootd summary messages to collector and checking what I see in plots. While messages arrive and get shown, there is something wrong in calculating/plotting summaries. Ilija Vukotic ivukotic@uchicago.edu 10

11 Ilija Vukotic ivukotic@uchicago.edu 11

12 DCACHE MONITOR 5.0.0 dCache monitor mostly rewritten: dCache compatible logging UDP messaging from same ports Sends “=” stream Sends more data (substitutes DN \CN with username etc.) Made compatible with collector Tested at MWT2. Very good results. End of the week, RPM will be produced and placed in WLCG repository. CMS will be informed about new version. Ilija Vukotic ivukotic@uchicago.edu 12

13 COLLECTOR New version being prepared by Matevz New AMQ version BIG ISSUE: Some CMS sites are sending info to our collector. Will be raised with Brian B. Ilija Vukotic ivukotic@uchicago.edu 13

14 DCACHE MONITOR 5.0.0 Ilija Vukotic ivukotic@uchicago.edu 14 Now gives really important and actionable information. Just during debugging I noticed: Files opened, read a small percentage and kept open for hours. Same file open twice in the same session (?!) Rather small usage of vector reads.

15 IN DASHBOARD Ilija Vukotic ivukotic@uchicago.edu 15 Why difference between table and plots? What’s idea of “Site history” tab? Need to investigate why CMS sites appear here (CERN-CMSTEST)

16 PANDA RE-BROKERING Discussed at last CERN S&C week We agreed on providing an estimate of cost to move data in WAN to PANDA, so it could re-broker jobs from very long queues to sites with free slots that have good connection to data. Cost matrix exist in SSB. Code reading it from SSB doing exponential decay smoothing runs and sends info to AGIS. Have to check scalability of AGIS bulk update. Waiting for Artem to code moving data from AGIS to schedconfig. Next step is Tadashi making use of that table from schedconfig and actually re-broker. Finally we’ll have to monitor it the same way we do with Failover. Ilija Vukotic ivukotic@uchicago.edu 16 No developments

17 50 SHADES OF GREEN Green color in any of the FAX SSB monitor metrics is based on one and the same file. This involves a lot of cached information. Need to find out a percentage of successfully obtained files from much large file pool while avoiding caching effects. Simple code developed to test all endpoints having FDR datasets. Doing _file0->ls() on each of the ~800 files. Sequential. Currently run by hand. You may find it in FAXtools/FAXtestsFDR of our CERN FAX git repo. Ilija Vukotic ivukotic@uchicago.edu 17


Download ppt "FAX UPDATE 26 TH AUGUST 2013. Running issues FAX failover Moving to new AMQ server Informing on endpoint status Monitoring developments Monitoring validation."

Similar presentations


Ads by Google