Presentation is loading. Please wait.

Presentation is loading. Please wait.

Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.

Similar presentations


Presentation on theme: "Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware."— Presentation transcript:

1 Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware

2 Stefano Belforte INFN Trieste February 14, 2007 Middleware 2 Middleware for CMS Event Data:  Catalogs are CMS-made: need to be tailored to experiment Non-Event Data:  Access outside CERN via http + standard web caches (Squid) Data transfer  Middleware provides the storage: SRM v2.2  Middleware provides File Transfer Service  CMS moves datasets on top of that: PhEDEx Running jobs  Middleware provides remote job submission  LCG RB, gLite WMS, Condor-G  CMS embeds that into CMS users workflows  CRAB, CRAB Analysis Server, ProductionAgent Resource sharing (job priorities and all that)  In the near future: managed at the sites

3 Stefano Belforte INFN Trieste February 14, 2007 Middleware 3 Issues for 2007 Data Management: SRM v2  site interoperability  better control at Tier1 of disk/tape, pin/unpin  New FTS and some changes needed in PhEDEx Job Priorities: only configuration/deployment issue  Have asked 3 “service classes” at all sites  software manager: express queue  production: up to 50% resources  normal users: all the rest, fair share based, static mapping will help Job Submission: still a big issues  LCG RB is slow (~one job/minute)  LCG RB chokes at ~5K jobs/day vs. 200K/day target for 2008  gLite WMS : much promised still not production after 2 years  Condor-G: fast and basic (too basic ?)  Will the CE be the next bottleneck ?

4 Stefano Belforte INFN Trieste February 14, 2007 Middleware 4 CMS plans For 2007 middleware integration and test for CMS is tackled within the Computing Commissioning sub-project (i.e. S.B.) Work on the current issues (especially scaling up the Job Submission tools) will be tackled jointly with OSG collaborators Means everybody checks their own tools, but we compare, possibly using same test suite, and will jointly pick the best solution for each use case A workplan for the next 6 months has been outlined  CMS-Italy and INFN have responsibility for testing the gLite tools  gLite WMS  gLite CE  CREAM CE (the next all-italian computing element)

5 Stefano Belforte INFN Trieste February 14, 2007 Middleware 5 From Computing Commissioning Plan SRM v2.2  Make sure CMS can use new SRM’s gLite 3.x  New WMS, new gLite CE  gLite3.1 single job (CMS), bulk submission (ATLAS)  Better error reporting in UI (important for dashboard) OSG  Stress test of various job submission tools  Stress test of current and future OSG CE’s  Stress test of dCache Job priorities  Verify that is consistently deployed and works Interoperability  Keep OSG and EGEE interoperating  Integrate NGDF aka NordUgrid  Condor-G submission to work for EGEE sites

6 Stefano Belforte INFN Trieste February 14, 2007 Middleware 6 Work program on gLite WMS gLite WMS to replace LCG-RB for single job submission  Better scalability, faster submission, additional features  Tested already to 1~2K jobs/day continously, 5K for short times  Work by EIS team (Andrea Sciaba’ and Enzo Miccio)  Time to use it with Production gLite WMS for bulk submission: higher performance  Stress test until April by EIS team  Already available in CRAB (but not advised for general users)  Work in progress to integrate in Production Agent  Carlos, Ale, Giuseppe, William gLite CE  EIS team to add them to test suite, easy  Expect better reliability and error reporting  Work for March,April, May Cream CE  Use same test suite, easy to add, have to see how it works  From April, onward

7 Stefano Belforte INFN Trieste February 14, 2007 Middleware 7 Status of gLite WMS Bulk submission from UI to WMS is fast Problem so far is that WMS dies under its own load  Could make 20K jobs/day, but not day after day  Not as simple as “reboot it”. Need specific actions (kill processes, restart processes, clean hung jobs, clean logs) every day or so. Not viable for production  Current “production” version gLite 3.0: no way Crash effort started since last fall on gLite 3.1  One machine at CERN under stress by Atlas (same pattern as CMS, using Andrea Scaba’s test suite)  Enormous work and progress by developers in last months, many components improved, including new Condor versions, processes that teminates themselves after some time. Tons of new patches As of last week it submits ~15K jobs/day using bulk submission continously (5 days in a row by now) More robustness expected after rewriting one critical piece to avoid Condor DAGMAN (work by F.Giacomini, almost finished)

8 Stefano Belforte INFN Trieste February 14, 2007 Middleware 8 Conclusion Resource Broker is not what one would like yet Still it may be almost there  Future is in the hands of CMS-Italy (and ATLAS-Italy) Keeping the Grid filled from a few submission points (lxplus, a few ProductionAgents) will be a daunting task anyhow One hammer does not fit all screws  Do not be surprised if in the end different submission tools will better serve different use cases  CRAB and Production Tools developers will make that transparent to users Do not panic at Grid cryptic error messages, we will analyse the data


Download ppt "Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware."

Similar presentations


Ads by Google