Presentation is loading. Please wait.

Presentation is loading. Please wait.

CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences

Similar presentations


Presentation on theme: "CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences"— Presentation transcript:

1 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t MSG status update Messaging System for the Grid First experiences daniel.rodrigues@cern.ch

2 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 2 Summary Current Status Problems encountered First outside experiences Near future

3 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 3 Current Status Broker –Started with two simple brokers supporting: Org.wlcg.metricDescription [OSG/SAM] Org.wlcg.metricOutput [OSG/SAM] Org.wlcg.usage.transfer [Gridview Gridftp] Org.wlcg.usage.jobStatus [Gridview RB] Org.wlcg.usage.jobDetails [LSF / JobWrapper ] Org.wlcg.management.downtime [OSG] –A few problems JDBC persistence (legacy from previous versions) No failover yet –Moving to failover pair using ActiveMQ Message Store On gridmsg101 / gridmsg102 Two steps: –1 st non OSG producers/consumers –2 nd OSG related topics

4 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 4 Current Status Msg-Consume2oracle –Normally in the same machine as the broker For test purposes and SAM 1 consumer at sam001.cern.ch for prototype database Consumer for gridview –Added several features during the past few months System Management –heartbeat; –Statistics Client Acknowledgment Failover adaptation Logging using RotatingFileHandler

5 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 5 Current Status Msg-publish-simple –REST: OSG (Sam metrics + management downtime) UsageTransfer (GridFtp Records) JobStatus (RB) JobDetails (JobWrapper) –STOMP: JobDetails (LSF) –Running in many different locations –Recent updates: Lemon Alarms [502 Errors] SysLogging / Rotating File Handler

6 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 6 Current Status System Management –Now, every consumer sends: Heartbeat –Connected –Time based –Disconnected (when clean) Statistics –NumberRecords/NumberMessages received/rejected –Simple prototype tools: Gwt JMS application to XML.

7 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 7 Problems Encountered gridFtp parser publishing: –Calling in a loop, increasing amount of messages, broker down: Only detected after huge CPU consumption; Had to rollback Irreproducible, but indicating urgency in having failover broker + lemon alarms on the publishers. –Had to rollback to previous versions while moving to failover; OSG – SAM - Gridview: –Consumer down Worked like a charm from Messaging point of view; But Gridview calculations on availability would drop results incoming after two hours; –Improvement by better consumer uptime; –And changing the algorithm for availability RB’s : –Conflicts due to old Python version (2.2! May 2003  )

8 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 8 First Outside Experiences LSF / JobWrapper –Working with Ulrich Schwickerath; Very good colaboration; Close loop –Idea: measure differences on cputime observed by experiences and in LSF Need to have an hybrid solution on the publishing: –REST/Stomp (Lsf results sent all in one batch, every 24h) Quickly developed a prototype, up in less than a few days; –LHCb showed quite a lot of interest in being a pilot; –Easily developed a perl script to wrap msg-publish-simple, now made available for easily instrumenting Jobs; –Changing/adding of new metrics/variables made simple, and non conflicting (clients altered independently, no interface versioning headache).

9 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 9 First Outside Experiences Easy deployment in 3 steps: –Agree on the semantics of the records –Install and configure msg-publish –Send messages calling msg-publish Some preliminary results: –We tracked some users using job arrays! –Which was good, and forced us to add some information on records For instance, quickly extracted CPUUSAGE/WALLTIME difference between lfs and jobwrapper

10 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 10 Near Future MSG usage related: –Currently, change over to failover pair Follow closely so it’s smooth –Support jobDetails/jobStatus development LHCb pilot tryouts … –Support new message Classes APEL Nagios … Dissemination: –!Documentation! –Presentations to community (Egee 08 Poster) Internal changes: –System Management improvement (publishers also should send information) –Moving to Queues where suitable, with tapping to topics. –Creation of a Network of brokers (with Emir/OSG, and maybe RAL) –Test other brokers than ActiveMQ (a few solutions have shown improvements, we are keeping an eye on it (RabbitMQ, for instance))

11 CERN IT Department CH-1211 Genève 23 Switzerland www.cern.ch/i t Internet Services MSG Status Update - 11 Thank you for your attention. Q&A


Download ppt "CERN IT Department CH-1211 Genève 23 Switzerland t MSG status update Messaging System for the Grid First experiences"

Similar presentations


Ads by Google