Deployment issues and SC3 Jeremy Coles GridPP Tier-2 Board and Deployment Board Glasgow, 1 st June 2005.

Slides:



Advertisements
Similar presentations
LCG Tiziana Ferrari - SC3: INFN installation status report 1 Service Challenge Phase 3: Status report Tiziana Ferrari on behalf of the INFN SC team INFN.
Advertisements

Quarterly report ScotGrid Quarter Fraser Speirs.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Southgrid Status Pete Gronbech: 27th June 2006 GridPP 16 QMUL.
NorthGrid status Alessandra Forti Gridpp13 Durham, 4 July 2005.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Quarterly report SouthernTier-2 Quarter P.D. Gronbech.
SC4 Workshop Outline (Strong overlap with POW!) 1.Get data rates at all Tier1s up to MoU Values Recent re-run shows the way! (More on next slides…) 2.Re-deploy.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
Quarterly report ScotGrid Quarter Fraser Speirs.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
WLCG Service Report ~~~ WLCG Management Board, 1 st September
Δ Storage Middleware GridPP10 What’s new since GridPP9? CERN, June 2004.
GridPP Deployment Status GridPP14 Jeremy Coles 6 th September 2005.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
CSCS Status Peter Kunszt Manager Swiss Grid Initiative CHIPP, 21 April, 2006.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
LCG Service Challenges: Planning for Tier2 Sites Update for HEPiX meeting Jamie Shiers IT-GD, CERN.
Light weight Disk Pool Manager experience and future plans Jean-Philippe Baud, IT-GD, CERN September 2005.
GDB March User-Level, VOMS Groups and Roles Dave Kant CCLRC, e-Science Centre.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
LCG Storage Accounting John Gordon CCLRC – RAL LCG Grid Deployment Board September 2006.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
SC4 Planning Planning for the Initial LCG Service September 2005.
Site Validation Session Report Co-Chairs: Piotr Nyczyk, CERN IT/GD Leigh Grundhoefer, IU / OSG Notes from Judy Novak WLCG-OSG-EGEE Workshop CERN, June.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Real Life Examples Tickets – Real life examples Mário David LIP - Lisbon.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
LCG WLCG Accounting: Update, Issues, and Plans John Gordon RAL Management Board, 19 December 2006.
OSG Deployment Preparations Status Dane Skow OSG Council Meeting May 3, 2005 Madison, WI.
Report from GSSD Storage Workshop Flavia Donno CERN WLCG GDB 4 July 2007.
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Accounting in LCG/EGEE Can We Gauge Grid Usage via RBs? Dave Kant CCLRC, e-Science Centre.
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
LCG Issues from GDB John Gordon, STFC WLCG MB meeting September 28 th 2010.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
INFSO-RI Enabling Grids for E-sciencE Upcoming Releases Markus Schulz CERN SA1 15 th June 2005.
SAM Status Update Piotr Nyczyk LCG Management Board CERN, 5 June 2007.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
ARDA Massimo Lamanna / CERN Massimo Lamanna 2 TOC ARDA Workshop Post-workshop activities Milestones (already shown in December)
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
II EGEE conference Den Haag November, ROC-CIC status in Italy
1/3/2006 Grid operations: structure and organization Cristina Vistoli INFN CNAF – Bologna - Italy.
LCG Accounting Update John Gordon, CCLRC-RAL 10/1/2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
CERN LCG1 to LCG2 Transition Markus Schulz LCG Workshop March 2004.
The status of IHEP Beijing Site WLCG Asia-Pacific Workshop Yaodong CHENG IHEP, China 01 December 2006.
Operations Workshop Introduction and Goals Markus Schulz, Ian Bird Bologna 24 th May 2005.
Bob Jones EGEE Technical Director
Regional Operations Centres Core infrastructure Centres
LCG Service Challenge: Planning and Milestones
Ian Bird GDB Meeting CERN 9 September 2003
LHC Data Analysis using a worldwide computing grid
Presentation transcript:

Deployment issues and SC3 Jeremy Coles GridPP Tier-2 Board and Deployment Board Glasgow, 1 st June 2005

June 2005Deployment update Current deployment issues Main GridPP concerns: gLite migration, fabric management & future of YAIM dCache Data migration – classic SE to SRM SE Security Ganglia deployment Use of ticketing system Use of UK testzone General Jobs at sites – improving (nb. Freedom of Choice is coming!) Few general EGEE VOs supported at GridPP sites

June 2005Deployment update 2 nd LCG Operations Workshop Took place in Bologna last week: Covered the following areas: –Daily operations –Pre-production service –Glite deployment and migration –Future monitoring (metrics) –Interoperation with OSG –User support (Executive Support Committee!) –VO management processes –Fabric management –Accounting (DGAS and APEL) –Little on security! Romain presented potential tools.

June 2005Deployment update LCG-2_4_0 Plan CPUs: 2_4_ _3_ _3_ CPUs: 2_4_ _3_ _3_0 2167

June 2005Deployment update Version Change in the last 100 days Others: Sites on older versions or down All sites in LCG-2

June 2005Deployment update Regions with less than 5 sites are not shown Canada Russia Italy Germany/Switzerland

June 2005Deployment update France Asia Pacific Northern SW

June 2005Deployment update Central SE

June 2005Deployment update UKI

June 2005Deployment update LCG-2_4_0 Lessons learned: –Harder than expected (rate independent of packaging) –Differences between regions --> ROCs matter –Release definition non trivial with 3 months intervals Components dependencies –X without Y and V is useless…. During certification we still find problems Upgrade and installation from scratch needed (time consuming) –Test pilots for deployment are useful –Early announcement of releases is useful –We need to introduce “updates” via APT to fix bugs that show during deployment –Number of sites is the wrong metric to measure success CPUs on new release needs to be tacked, not sites

June 2005Deployment update The next release Why? –SC3 is approaching and the needed components are not deployed at the sites What? –File transfer service (will need VDT 1.2.2) Servers for Tier1 and Tier0, clients for the rest –Improved monitoring sensors for gridFtp –RFC proxy extension for VOMS –New version of the GLUE schema (compatible) –LFC production service –Interoperability with GRID3/OSG –User level stdio monitoring (maybe later) –Bug fixes …….. as always When? –Aimed at mid June Who? –Tier 1 centers and Tier 2 centers participating in SC3 As fast as possible –Others? At their own pace –Updated release (fixes from 1st release) expected by July 1st.

June 2005Deployment update SITE FIREMAN VOMS LFC sharedLCG gLite SRM-SE myProxy gLite WLM RB UIs WNs gLiteLCG gLite-IO gLite-CE FTS LCG CE FTS R-GMA BD-II Data from LCG is owned by VO and role, gLite-IO service owns gLite data FTS for LCG uses user proxy, gLite uses service cert R-GMAs can be merged (security ON) CEs use same batch system Independent IS Catalogue and access control Coexistence & Extended Pre-Production Coexistence & Extended Pre-Production dgas APEL

June 2005Deployment update SITE VOMS LFC sharedLCG gLite SRM-SE myProxy gLite WLM RB UIs WNs LCG gLite-CE LCG CE FTS R-GMA BD-II FTS for LCG uses user proxy, gLite uses service cert CEs use same batch system Gradual Transition 1 gLite dgas APEL Optional additional WLM Data Management LCG Optional dgas accounting Optional additional WLM Data Management LCG Optional dgas accounting

June 2005Deployment update SITE VOMS LFC sharedLCG gLite SRM-SE myProxy gLite WLM UIs WNs LCG gLite-CE FTS BD-II Gradual Transition 2 gLite R-GMA FIREMAN dgas APEL Removed LCG WLM Optional Catalogue R-GMA in gLite mode Removed LCG WLM Optional Catalogue R-GMA in gLite mode

June 2005Deployment update SITE VOMS LFC sharedLCG gLite SRM-SE myProxy gLite WLM UIs WNs LCG gLite-CE FTS BD-II Gradual Transition 3 gLite R-GMA FIREMAN gLite-IO FTS Data from LCG is owned by VO and role, gLite-IO service owns gLite data dgas APEL Adding gLite-IO Second path to data Additional security model Data migration phase Adding gLite-IO Second path to data Additional security model Data migration phase

June 2005Deployment update SITE VOMS LFC sharedLCG gLite SRM-SE myProxy gLite WLM UIs WNs LCG gLite-CE BD-II Gradual Transition 4 gLite R-GMA FIREMAN gLite-IO FTS dgas APEL Finalize switch to new security model. LFC, now a local catalogue under VO control BDII later replaced by R-GMA Finalize switch to new security model. LFC, now a local catalogue under VO control BDII later replaced by R-GMA

June 2005Deployment update Metrics - EGEE General Agreement on the concept –detailed discussions on: time windows –Sliding windows (week, month, 3 month) quantities to watch for (RCs, ROCs, CICs…..) –ROCs based on RCs –CICs based on services –Release quality has to be measured To make progress: workgroup to define quantities –Organized by: Ognjen Prnjat –Small (˜5), Ognjen, Markus, Helene, Jeff T. and Jeremy –Ognjen will collect input –ROCs, CICs and OMC have to agree on ONE set of quantities

June 2005Deployment update Operations summary CIC On Duty is now well established –COD is just 6 month old!!!!! –Tools have evolved at a dramatic pace Portal, SFT,…… –Many rapid iterations Truly distributed effort Integration of new COD partner (Russia) went smoothly –Tuning of procedures is an ongoing process No dramatic changes (take resource size more into account)

June 2005Deployment update Accounting Last November still an area of concern –APEL now well established Support for batch systems is improving Several privacy related problems have been understood and solved –gLite Accounting: DGAS Some concerns about amount of information published –Can be handled by proper authorization? Collaboration with APEL on batch sensors (BBQS, Condor,..) –DGAS agreed to provide them Will be introduced initially on a voluntary basis –Sites will give feedback (including privacy issues)

June 2005Deployment update Current deployment issues (recap) Main GridPP concerns: gLite migration, fabric management & future of YAIM dCache Data migration – classic SE to SRM SE Security Ganglia deployment Use of ticketing system Use of UK testzone General Jobs at sites – improving (nb. Freedom of Choice is coming!) Few general EGEE VOs supported at GridPP sites

June 2005Deployment update Freedom of choice - VO Page

June 2005Deployment update Service Challenge 3

June 2005Deployment update SC2 SC3 LHC Service Operation Full physics run First physics First beams cosmics June05 - Technical Design Report Sep05 - SC3 Service Phase May06 – SC4 Service Phase Sep06 – Initial LHC Service in stable operation SC4 SC2 – Reliable data transfer (disk-network-disk) – 5 Tier-1s, aggregate 500 MB/sec sustained at CERN SC3 – Reliable base service – most Tier-1s, some Tier-2s – basic experiment software chain – grid data throughput 500 MB/sec, including mass storage (~25% of the nominal final throughput for the proton period) SC4 – All Tier-1s, major Tier-2s – capable of supporting full experiment software chain inc. analysis – sustain nominal final grid data throughput LHC Service in Operation – September 2006 – ramp up to full operational capacity by April 2007 – capable of handling twice the nominal data throughput Apr07 – LHC Service commissioned SC timelines

June 2005Deployment update Service Challenge 3 - Phases High level view: Throughput phase –2 weeks sustained in July 2005 “Obvious target” – GDB of July 20 th –Primary goals: 150MB/s disk – disk to Tier1s; 60MB/s disk (T0) – tape (T1s) –Secondary goals: Include a few named T2 sites (T2 -> T1 transfers)  Encourage remaining T1s to start disk – disk transfers Service phase –September – end 2005 Start with ALICE & CMS, add ATLAS and LHCb October/November All offline use cases except for analysis More components: WMS, VOMS, catalogs, experiment-specific solutions –Implies production setup (CE, SE, …)

June 2005Deployment update SC implications SC3 will involve the Tier 1 sites (+ a few large Tier 2) in July –Must have the release to be used in SC3 available in mid-June –Involved sites must upgrade for July –Not reasonable to expect those sites to commit to other significant work (pre-production etc) on that timescale –T1: ASCC, BNL, CCIN2P3, CNAF, FNAL, GridKA, NIKHEF/SARA, RAL and Expect SC3 release to include FTS, LFC, DPM, but otherwise be very similar to LCG September-December: experiment “production” verification of SC3 services; in parallel set up for SC4 Expect “normal” support infrastructure (CICs, ROCs, GGUS) to support service challenge usage Bio-med also planning data challenges –Must make sure these are all correctly scheduled

June 2005Deployment update SC3 issues Tier-1 network being extensively re-configured. Tests showed up to 40% packet loss! Waiting for UKLight to be fixed. Not intending to use dual- homing but dCache have provided a solution Lancaster link up at the link level What is the bandwidth of the Lancaster connection Edinburgh hardware problem with raid-array to be used as SE – IBM investigating Lancaster set up test system. Now deploying more hardware Need clarification about classification of volatile vs permanent data in respect of Tier-2s The file transfer service should be ready now but has problems with the client component RAL would like longer period for testing tape than suggested in SC3 plans There has been an issue with CMS preferring to use Phedex and not to use FTS for transfers. We need to add into the plans a period to do Phedex only transfer tests dCache mailing list very active now. There have been problems with the installation scripts

June 2005Deployment update SC3 issues continued We have questions about whether FTS uses SRM-put or SRM-cp. From September onwards SC3 infrastructure is to provide a production quality service for all experiments – remember comments about UKLight being a research network – risk!? Differing engagement with the experiments. Edinburgh needs a better releationship with LHCb There is an LCG workshop in mid-June where the experiment plans should be almost final! GridPP needs to do more load testing than is anticipated in SC3 Planning for SC4 needs to start soon. Currently we are pushing dCache but DPM is also supposed to be available.

June 2005Deployment update Imperial (London Tier-2) SRM/dCache Status –Production server installed gfe02.hep.ph.ic.ac.uk Information provider still developing –1.5TB Pool node added RHEL 4, 64 bit system Installed using dcache.org instructions –Extra 1.5TB ready to add when CMS ready –6TB being purchased. Should be in place by start of Setup Phase CMS Software –Service node provided –Phedex installed –Confirmation on FTS/Phedex issue sought

June 2005Deployment update Edinburgh Current LCG production setup: Compute Element (CE), Classic Storage Element (SE), 3 Worker Nodes (2 machines, 3 CPUs). Monitoring takes place on the SE, running LCG About to add 2 Worker Nodes (2 CPUs in 1 machine) and have a User Interface (UI) in testing. We have a 22TB datastore available Plans £2000 available for 2 machines - one for dCache work and one to connect to EPCC's SAN (10 TBytes promised). Considering the procurement of more WNs but have no clear requirements from LHCb.

June 2005Deployment update Lancaster (current)

June 2005Deployment update Lancaster (planned) 1.LighPath and terminal Endbox installed. 2.Still require some hardware for our internal network topology. 3.Increase in Storage to ~84TB to possible ~92TB with working resilient dCache from CE

June 2005Deployment update Other areas…

June 2005Deployment update JRA4 request We have some idea of requirements from networking experts within JRA4 Draft requirements document available here: – Draft use case document available here: – We’re looking for more input from NOCs and GOCs If you have requirements, use cases or opinions on interfaces or needed metrics, please send them to us Even if you don’t have ideas at the moment, but would like to be involved in the process, please get in contact Contact details are at the end of the talk

June 2005Deployment update DTEAM discussion Review of team objectives – what is the team focus for the next 3 & 5 months Communications with the experiments Using a project tool to work better as a team Metrics!! Review of plans and what needs to be done to keep them up-to-date including GridPP challenges and SC4 Web-page status Areas raised at the T2B and DB meetings Security challenge involvement Accounting – status and making further progress Libraries and understanding expt. Needs Review dCache efforts Address issues with Quarterly reports & weekly reports Next release, test-zone and test-zone machines Data management – guidelines required Improving robustness GI – (Documentation (esp. releases), multi-Tier R-GMA, intro. New sites, LCFGng distribution (Kickstart & Pixieboot… ), jobs – how to get