Presentation is loading. Please wait.

Presentation is loading. Please wait.

Deployment Summary GridPP11 Jeremy Coles 15th September 2004.

Similar presentations


Presentation on theme: "Deployment Summary GridPP11 Jeremy Coles 15th September 2004."— Presentation transcript:

1 Deployment Summary GridPP11 Jeremy Coles J.Coles@rl.ac.uk 15th September 2004

2 Overview What is deployment all about anyway? Who is doing it? Planning and metrics Issue 1: Communications Issue 2: Fabric management Where are we now?

3 Are the developers bailing out? Who is flying the plane? We have paying passengers – do we know where we are going? … oh, and can we keep it working, navigate, land and offer a real service?

4 Who is flying the plane? NumberPositionStatus General 1Production managerIn place and fully engaged 1Applications expertIdentified but not formally engaged 2Tier-1 /deployment expertIn place and fully engaged 4Tier-2 coordinatorsIn place and fully engaged 0.5VO managementWill be part time but not yet in place 9.0Hardware supportPost allocated but not yet filled Specialist 1Data and storage managementExisting expert 1Work load managementExisting expert 1Security officerNot yet recruited 1NetworkingStarting in September … introducing the err…. DTEAM + site system administrators + …

5 Deployment Board Replaces GridPP1 Technical Board Mandate –Determine and oversee execution of tech plan –Report to PMB –Ensure GridPP-wide issues discussed/solved –Provide forum for tech info exchange –Oversee deployment and use of GridPP h/w –Tier1 – Tier2 coordination/liaison –Ensure integration of external tech developments

6 DB members Production Manager Tier1/A Manager 4 T2 Technical Coordinators HEP SYSMAN chair CERN T0/Deployment Applications Area Coordinator Middleware Area Coordinator Technical experts (invited by DB chair) UK NGS EGEE/Ireland DB chair ~18 people

7 DB relations PMB DBUB T1ABT2BM/S/NAPPS LCG/EGEE/CERNT0 UK NGS GridPP DTEAM

8 What must deployment address? Core infrastructure services Resource brokers Informational services Data management services Virtual Organisation management Replica Location Service BDII Grid monitoring Monitor operational performance Monitor operational state Problem resolution + operations support tools Middleware deployment Required local validation of common middleware Feedback issues to LCG/EGEE Continuous upgrade Mechanism(s) Resource induction New site joining procedures Provide support for middleware installation Advise on operational procedures Resource support Respond to and coordinate resolution of fabric problems Engage wider community to resolve new problems User support Provide a support service for users (filter and distribute) Monitor effectiveness of support Provide training and induction courses Documentation (and quality)

9 Areas (2) Communication Representation within experiments Procedures and mechanisms within community Applications Ensuring local VOs receive support and guidance Participate in testing and validation exercises Components Workload management Data management Storage management Information services Network services Network performance monitoring Demand (aggregate traffic) vs supply (performance) Resource allocation/reservation Inter-grid collaboration Participate in discussions to work closer with other Grids Ensure interoperability of infrastructure and services Service-level agreements Monitor Tier-2 compliance with MoUs Access policies Security Certification authority Implement and monitor policy Incident response Policy management Operations planning Understand usage patterns Capacity planning Monitoring problems log

10 Navigation No clear plans within LCG for overall deployment – improving Some confusion about EGEE connections GridPP2 project plan is not complete and we have dependencies Currently developing in a “best guess” environment It is not always clear exactly where decisions get made What does the planning environment look like so far? There are already pressing issues to be addressed: What is the UK stance regarding fabric management tools (LCFGng is being phased out) How are we going to measure deployment and operations success – metrics What is the communications plan given that LCG-ROLLOUT has become a gossip column – support, news, problem reporting

11 Are we communicating…? Areas Grid news – no well defined broadcast route – e.g. middleware updates Site News – operational incidents on Grid, site updates Support – user, deployment Problems – As found by daily tests or discovered by users Issues LCG-ROLLOUT is overloaded! Lack of visibility about what is happening at sites – upgrade, site problem Problems may generate many queries No tracking for support or logging of queries … and therefore poor ability to search for other experiences Options 1)Set up a new news area based on RSS (new entries are placed in categories that people can register to receive updates from) – just use of GOC pages? 2)Establish support desk for GridPP – but there are concerns about expertise 3)DTEAM area & better documentation

12 An example [LCG-Problems] mail list has 2 members!

13 Are we going up or down? Metrics Work in progress!

14 Metrics (2) Work in progress!

15 Migration to SL3 is starting. Next public release of LCG supports SL3 WNs, certification complete. –Service nodes remain at RH7.3 for now. –LCFGng is not an option SL3 nodes. –LCG supports one install method for SL3. Manual install techique (Actually not very manual) Can be built into any framework already in use –Kickstart and scripts, Cfengine, NPACI Rocks, Quattor, stateless linux or even LCFG This release expected this month. Maintenance

16 Quattor Community effort for quattor installaion of LCG2 nearing completion. 98% done. Quattor has similar architecture and concept to LCFG. LCFG effort not wasted. Advantages –CERN and the RAL Tier1/A will use quattor for LCG. - Support and self help for others available. –LCG M/W will not be tied to or released with quattor. Disadvantages –A lot to learn before any pay back.

17 Steve’s 5 questions Once SL3 port is available is RH 7.3 still wanted anywhere? Is an OS other than SL3 needed for GridPP sites and users? Does any site have a conflict with proposed deployment of LCG into SL3? Is there a site to work with RAL learning Quattor? Should the UK use or at least favour one fabric management solution? Yes – probably Quattor Maybe on very few shared sites Need to ask experiments – perhaps if CERN upgrades soon No – most want to move off of RH 7.3 Manchester?

18 Summary Smooth running. Easy and seamless deployments. Service quality The DTEAM! The plans (& metrics) are being developed – many dependencies LCFG will be phased out. Quattor on SLC3 is coming. LCG2 deployed. 1500+ CPUs LCG-ROLLOUT needs to migrate to news & helpdesk services


Download ppt "Deployment Summary GridPP11 Jeremy Coles 15th September 2004."

Similar presentations


Ads by Google