Presentation is loading. Please wait.

Presentation is loading. Please wait.

LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D

Similar presentations


Presentation on theme: "LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D"— Presentation transcript:

1 LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D http://lcg3d.cern.ch

2 LCG 3D StatusDirk Duellmann2 LCG 3D Service Architecture T2 - local db cache -subset data -only local service M O O O M T1- db back bone - all data replicated - reliable service T0 - autonomous reliable service Oracle Streams http cache (SQUID) Cross DB copy & MySQL/SQLight Files O Online DB -autonomous reliable service F S S SS R/O Access at Tier 1/2 (at least initially)

3 LCG 3D StatusDirk Duellmann3 LCG Database Deployment Plan After October ‘05 workshop a database deployment plan has been presented to LCG GDB and MB –http://agenda.cern.ch/fullAgenda.php?ida=a057112 Two production phases March - Oct ‘06 : partial production service –Production service (parallel to existing testbed) –H/W requirements defined by experiments/projects –Based on Oracle 10gR2 –Subset of LCG tier 1 sites: ASCC, CERN, BNL, CNAF, GridKA, IN2P3, RAL Oct ‘06- onwards : full production service –Adjusted h/w requirements (defined at summer ‘06 workshop) –Other tier 1 sites joined in: PIC, NIKHEF, NDG, TRIUMF

4 LCG 3D StatusDirk Duellmann4 Oracle Licenses For Tier 1s 3D collected license needs from experiments and s/w projects –And validated with T1 site responsibles –Some 152 processor licenses (incl. Grid Services and Castor) CERN has negotiated with Oracle a proposal with very attractive conditions T1 sites have been contacted and agreed to the proposal –FNAL had already acquired their licenses Tier 1 sites should now be covered for s/w and support! –Support accounts for Tier 1s will be enable as soon as papers have been signed and site contribution has been received.

5 LCG 3D StatusDirk Duellmann5 Oracle Instant Client Distribution The issue of client distribution has been discussed with Oracle and an agreement has been achieved The instant client can be integrated into the LCG middleware and application distributions –As long as the included license file is preserved and adhered to –The SPI project in the Application Area will from now on bundle the software as part of AA releases. Experiments and LCG middleware should take advantage and pick up validated client releases from this single source. –Version management will happen as for other AA packages via the established channels for external packages

6 LCG 3D StatusDirk Duellmann6 Tier 1 Hardware Setup Propose to setup for first 6 month –2/3 dual-cpu database nodes with 2GB or more Setup as RAC cluster (preferably) per experiment ATLAS: 3 nodes with 300GB storage (after mirroring) LHCb: 2 nodes with 100GB storage (after mirroring) Shared storage (eg FibreChannel) proposed to allow for clustering –CMS: 2-3 dual-cpu Squid nodes with 1GB or more Squid s/w packaged by CMS will be provided by 3D 100GB storage per node Need to clarify service responsibility (DB or admin team?) Target s/w release: Oracle 10gR2 –RedHat Enterprise Server to insure Oracle support

7 LCG 3D StatusDirk Duellmann7 LCG 3D - Tier 1 Database Setup Phase 1 Sites –ASGC, BNL, CNAF, IN2P3, RAL - DB cluster available, part of 3D throughput tests –GridKA - DB cluster available, expected to join soon Phase 2 Sites –TRIUMF - regular attendance in 3D planning meetings –PIC, NIKHEF/SARA - DBA contact established –NDGF - early discussions

8 LCG 3D StatusDirk Duellmann8 LCG 3D Throughput Tests Initially scheduled for May: –Use the production database clusters at tier 1 and obtain a first estimate for the replication throughput which can be achieved with the setup –Eg as input to experiment models for calibration data flow Tests started beginning of May, but will extend until end of June Main reasons –Server setup problem (often db storage) at sites –Firewall configuration –Throughput optimization need Oracle experts to be involved

9 LCG 3D StatusDirk Duellmann9 June Test Setup Now tests are split into –Basic site tests with all T1s –LAN throughput optimization with Oracle (CERN local) ATLAS and CMS online groups involved –WAN throughput (CERN-CNAF) LFC team and LHCb involved –monitoring setup tests (CERN - RAL) Significant load on one central expert at CERN –Trying to offload with the help of the database service team

10 LCG 3D StatusDirk Duellmann10 Database Monitoring A central Oracle Enterprise Manager repository at CERN has been setup to collect the status and detailed diagnostics of all 3D production clusters. –Some sites will in parallel have the information integrated into their site local OEM setups Integration into higher level monitoring tools will be based on this information –Eg experiment dashboard, GridView In addition test jobs (a la SFT) are planned as soon as higher level services (eg COOL) are deployed by experiments

11 LCG 3D StatusDirk Duellmann11 Preliminary Streams Performance Out of the box replication performance –Between 10 and 30 MB/min (LAN) for a single conditions insert job –Sufficient for current experiment conditions scenarios –Still quite some room for optimisation as neither source nor destination DB seem to be significantly loaded Regular phone calls between Oracle and 3D DBA –Head of streams development attending

12 LCG 3D StatusDirk Duellmann12 Frontier/SQUID Tests Stress tests with many clients at CERN to validate production setup for CMS –DNS based failover validated –Setting up an additional node for ATLAS Frontier tests with COOL CMS Frontier tests at FNAL –Focus on connection retry and failover –Common s/w component shared with database acces –CMS July release may pickup LCG AA s/w release which is currently prepared

13 CMS Squid Deployment Squids deployed at several Tier 1 and 2 sites. –Tier 1: LCG: ASCC,IN2P3, PIC, RAL, CNAF, FZK, CERN and OSG: FNAL –Tier 2:LCG: Bari, CIEMAT, DESY, Taiwan, OSG: UCSD, Purdue, Caltech Remaining Tier 2 sites for SC4 goals: –LCG: Belgium, Legnaro, Budapest, Estonia, CSCS, GRIF, Imperial, Rome, Pisa, NCU/NTU, ITEP, SINP, JINR, IHEP, KNU and OSG: Florida, Nebraska, Wisconsin, MIT. –In progress, plan to finish in the next month. Request WLCG support for Tier-2 Squid installation –Minimum specs: 1GHz CPU, 1GByte mem, GBit network, 100 GB disk. –Needs to be well connected (network-wise) to worker nodes, and have access to WAN and LAN if on Private Network. –Having 2 machines for failover is a useful option for T-2 but not required. –Sites perform Squid installation using existing instructions –Installation should be completed and tested by July 15, 2006.

14 LCG 3D StatusDirk Duellmann14 Messages to T1 Sites Initial set of Tier 1s Production Setups need to be usable for throughput tests of the experiments by end of June –Need consistent meeting attendance of all sites –Need all sites to join the monitoring setup asap so that we can start integration with –Need DBA coverage during the experiment tests July-Oct New Tier 1s - NGDF, PIC, NIKHEF/SARA and TRIUMF Production setup needs to be usable for full production in October –DB h/w and s/w setup planning will need to start soon (sites should start from current h/w request) –Please let us know by when your h/w acquisition could/should start

15 LCG 3D StatusDirk Duellmann15 Steps for Phase 1 sites Get an account on the 3D wiki (in case you don’t have one yet) –to be able to update the site status and configuration Add all production clusters to 3D OEM setup –Thanks to RAL for testing the procedure -> Documentation up to date? Create schemas for replication tests –Eva will provide documentation on the Wiki Mark your site as being updated in the site status table on the wiki Target date: before next meeting (Thu 22nd)

16 LCG 3D StatusDirk Duellmann16 Messages to Experiments Need the experiments to confirm or update their current database and/or Frontier requests for October production –Need an experiment work plan from each experiment for period from July to October –Database and Frontier –A significant fraction of the service expected as of October needs to be demonstrated by middle of August –The plan should address the full chain from online databases over offline to tier 1 This activity will require significant experiment resources to drive the tests

17 LCG 3D StatusDirk Duellmann17 Timeline 26 June –Meeting with LHCC referees - need site status & experiment plans for July-Oct End of June –Throughput phase closed, experiment application and throughput test start Early-July –3D DBA day to plan database setup options with new tier 1 sites. Hosted by one of the new sites? August –5 Day Oracle Admin Course @ CERN for Experiment/Site Database Teams –3D workshop (experiments and sites) defining October setup and service September –Experiment ramp-up test with production setup of phase 1 sites October –Full service open at all tier 1 sites


Download ppt "LCG 3D Project Update (given to LCG MB this Monday) Dirk Duellmann CERN IT/PSS and 3D"

Similar presentations


Ads by Google