Presentation is loading. Please wait.

Presentation is loading. Please wait.

Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.

Similar presentations


Presentation on theme: "Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007."— Presentation transcript:

1 Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007

2 October 7, 2005 2 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Deployment History  gLite-3.0 was delivered in May 2006  Rapidly deployed to Tier 1 sites  After 1 st update (3.0.1) full deployment across EGEE  Two full update releases  3.0.1, 3.0.2 delivered in June, August  Change to incremental updates (move away from big-bang releases)  12 updates to 3.0.2 – rapidly deployed by all sites  Anticipate major releases only for major changes:  e.g. 3.1.0 will be SLC4  But even then will avoid major functional or behavioural changes

3 October 7, 2005 3 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Comments  No real distinction between what was LCG-2.7 and gLite-3.0  Mainly evolved versions of existing services  Some “new” services – they existed before but are now in use (e.g. VOMS)  The introduction of gLite-3.0 was not disruptive to the production service  Although it did cost effort from the sites!  Most gLite-3.0 services are deployed only in EGEE; exceptions are:  FTS deployed also in US Tier 1s and NDGF  WMS/LB used by CMS to submit work across EGEE and OSG  VOMS

4 October 7, 2005 4 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status of major components  VOMS:  VOMS service in full production; old ldap-based VO services stopped  VOMS roles and groups:  FTS already supports roles and groups  DPM supports roles, groups and ACLs  dCache 1.7 supports roles, groups at disk-pool level; ACLs mid-year  Castor – no real estimate yet  Job priorities: VOViews and batch system support being tested now; supported by gLite WMS  R-GMA:  Used as back-end of APEL (accounting)  Used as monitoring transport mechanism  … and hence used by dashboards

5 October 7, 2005 5 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status – 2  FTS:  Used by all experiments; deployed at Tier 0 and all Tier 1s  Rapid cycle of fixes for issues found in Service Challenges and ongoing use  Most major issues (e.g. fat clients) have been addressed  Version 2.0 support for SRM2.2  LFC:  In production, used by ATLAS, LHCb  Deployed as both central and local file catalogues  Major issues addressed:  python API problems  Bulk queries (can now achieve 300 Hz)  GFAL/lcg-utils: main SRM clients  Used by all experiments  Updated to support SRM v2.2

6 October 7, 2005 6 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status – 3  WMS/LB:  ATLAS and CMS rely on gLite WMS functionality – particularly the job collections  CMS can still use LCG-RB for MC production; not an option for ATLAS  LHCb and ALICE use LCG-RB or gLite WMS as basic submission tool  Major testing effort with CMS (and ATLAS participation) in Q306 to get WMS to state to be used in CSA06  Testing showed that rates of ~26k jobs/day are feasible from a single node in quiet conditions:  Now see that extended testing shows memory consumption limits to ~10k jobs/day  In CSA06 CMS achieved workloads of 5-8k jobs/day on each of 2 WMS nodes, but limited by bottlenecks in CMS components (later fixed)  In use by ATLAS MC production since July at rates up to 4k jobs/day on single WMS node  Major issues now are:  Reliability of service is not adequate for service managers or production managers  Starting second phase of testing now, with full ATLAS & CMS participation  ATLAS more sensitive to reliability issues in their production  Aim is  reliable (equiv to LCG RB stability) operation  with 50K jobs/day Q207 and 100-150K jobs/day on <10 nodes by end of 2007

7 October 7, 2005 7 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Status – 4  CE:  Not widely deployed, but testing has shown it to be now reasonably reliable  There were a number of problems initially  BUT:  Condor bug limited #jobs in a batch system to 100 (!)  Now have a fix for this, serious scale testing is starting  Phasing out the LCG CE is not so urgent:  Experiments want stability  gLite CE brings limited additional functionality (pass of job resource requirements to batch system)  Need to avoid porting LCG-CE to SLC4 if possible

8 October 7, 2005 8 Ian.Bird@cern.ch LHCC Referees Meeting; January 29 th 2007 Summary  gLite-3.0 is the production middleware on EGEE  Some services used elsewhere also  All services are in production use  Testing efforts for WMS/LB and CE still needed to get to desired performance and reliability levels


Download ppt "Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007."

Similar presentations


Ads by Google