Presentation is loading. Please wait.

Presentation is loading. Please wait.

INFSO-RI-508833 Enabling Grids for E-sciencE Fabric and Management WG Davide Salomoni NIKHEF Lyon, ARM-3 –

Similar presentations


Presentation on theme: "INFSO-RI-508833 Enabling Grids for E-sciencE Fabric and Management WG Davide Salomoni NIKHEF Lyon, ARM-3 –"— Presentation transcript:

1 INFSO-RI-508833 Enabling Grids for E-sciencE www.eu-egee.org Fabric and Management WG Davide Salomoni (Davide.Salomoni@nikhef.nl) NIKHEF Lyon, ARM-3 – 17-18/3/2005

2 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 2 Agenda Background Existing Initiatives Areas of Interest Actions

3 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 3 This Presentation A bit of history –During the last LCG Operations Workshop (CERN, Nov 2-4, 2004) the “Operational Fabric WG” was set up. –Reasons:  A weak point of the existing (LCG) infrastructure was at the time identified to be in the unreliability of fabric infrastructures (this is not to say that middleware or applications were always 100% reliable.)  Several participants to the meeting expressed interest in sharing and acquiring knowledge on how to improve fabric mgmt procedures. Why here –Hopefully we’ll come out of this session with an agreed action list to start this up within SA1. Format note: this session is especially meant to be interactive.

4 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 4 Existing Efforts There are several local, national or regional initiatives dealing already with fabric management issues. For example: –There are dedicated working groups (or interest groups) in the UK and Italy. –CERN contributes to fabric topics through several channels (e.g., the Functional Tests, Quattor, YAIM). –Several resource centers have developed (more or less local) procedures, guides, expertise. –LCG has some established channels:  E.g.: the GOC Wiki, the omnipresent LCG-ROLLOUT.  The LCG Operations Workshop series. –SA1 is directly involved in several areas (e.g. through the CIC- on-duty activity)  Last but not least, this has an impact on existing or future SLA discussions.

5 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 5 So, why another group? First, what if we don’t do anything? –We miss opportunities to simplify our (SA1 hat here) job and make the grid more reliable, usable, and used. And probably we will keep on:  Not receiving answers to questions. Or, answers might be there, but buried say within LCG-ROLLOUT. Or, some people, whom we don’t know, might be able to answer them, but how do we get to know where are these people.  Spending inordinate amount time to go through all the EGEE bodies, trying to figure out who does or know what, etc. –There is of course also the risk that, because also of the issues being discussed here, the EGEE Reviewers will get fairly dissatisfied with EGEE. My overall view and goal for this group: –With the general purpose of helping to set-up, operate EGEE from a fabric point of view, connect several worlds, all having some inter-relation with each other. For example:  From local, national, regional initiatives, learn and share what has been done and what are the problems being discussed there.  From LCG-ROLLOUT, get troubleshooting/installation/monitoring info.  From the security group(s), get info to save us all a lot of trouble.  From the Quattor WG may come input useful to people using Yaim. And viceversa.  From the LCG Service Challenge, extract experience useful to set up a grid site (not necessarily a Tier 1 or Tier 2 site).  …

6 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 6 Areas of Interest LRMS issues –Torque/Maui, LSF, SGE, possibly others. Products that are complex to configure, sometimes subject to active development, sometimes individually patched to fix quirks or add features.  A first effort: the Maui cookbook now delivered with LCG. But not much feedback received as of yet.  Often times there are (new) LRMS features that are only known by a few people, while they could be put to use in many sites. How do I monitor my fabric –GridICE. Lemon. Ganglia. The Functional Tests suite. Throw in a few custom scripts for good measure. Do I need all of these? Do I need any of these? (or, are the grid nodes supposed to do something else beside monitor and being monitored?)

7 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 7 Areas of Interest (cont.) How do I set up a fabric for grid use? –This is obviously a very broad question, but some sites (esp. middle/small size) just want pointers to start. A nice work is the SEE- GRID draft. –For example:  Hardware configurations  How do I – ahem – install my nodes?  Redundancy problems (how do I make sure my site survives if the CE/LRMS goes down)  Middleware installation (I run Debian! Or SUSE, or something else)  How many grid service nodes do I need or want?  Which network set up should I consider? An example of issues taken from a recent email (Rafael Leiva): –Grid middleware is difficult to install and configure –Grid technology needs very stable and well configured environments (think security, effect of mis-configured sites on other grid resources) –The requirements to be part of a Grid are very strict –Grid problems are equal to small, medium and large sites –Too many maintenance stops in production grid sites –Site administrators do not want to change current management tools

8 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 8 Areas of Interest (cont.) Work on implementing some of the EGEE Review Recommendations –See Cristina’s slides –Nov/Dec 2005 (!): EGEE focused review –We DO need to formalize problem areas, and in a very short time (see actions, next slide).  Proactive (event-based) monitoring.  In case of a security incident, how could my site be isolated in the shortest possible time. (possibly avoiding false positives!)

9 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 9 Actions? Organization of this WG –Who should participate? A hierarchical structure? (per-region SA1 representatives collect issues/info) Or a broader participation? I suggest a mixed approach:  We need reps from each region, able to express regional interest in fabric topics. Can we take an action here so that each ROC manager communicates to the list a name (or set of names)?  But we also may want (or need) to encourage sites to participate directly. With the caveat that this should not reduce to another LCG-ROLLOUT list. This is all the more true if we decide to embark on working on implementing EGEE Review recommendations. –A mailing list (of course…)  project-eu-egee-sa1-fabric-working-group (subscribe via http://listboxservices.web.cern.ch/listboxservices/) http://listboxservices.web.cern.ch/listboxservices/ –Face-to-face meetings  2-3 times per year? Can it (do we want it to) be coincident with EGEE general meetings? With ARM meetings? With LCG Operations meetings? Next LCG Operations Meeting: 24-26/5/2005, Bologna  Phone meetings perhaps less useful

10 Enabling Grids for E-sciencE INFSO-RI-508833 EGEE ARM-3, 20050317 10 Actions (cont.) Inventory! –Identify global activity areas (á la EGEE Review) –And we also need to know where local interest, and knowledge is.  Send this info to the list. –Which tools are available or in use today? I am sure there is a sizeable amount of excellent stuff (guides, programs) developed across EGEE that could be re-used somewhere else. Or perhaps replaced by something more complete/simple/effective. Do we need another web site and/or wiki? –We already have the GOC wiki for example, and it contains lots of useful info. But there are only a limited number of people publishing there, and it is a bit difficult to keep track of addition/changes. Keep a broad approach –We need to be in touch with middleware developers and integrators. How do we do this? (the SA1 / JRA1 / JRA3 / NA3 / NA4 / PEB / PTF / whatnot problem, see e.g. Ognjen’s presentation) –Do not identify with specific solutions; e.g., Quattor vs. Yaim – both have probably their public, and solve [and create!] different problems.


Download ppt "INFSO-RI-508833 Enabling Grids for E-sciencE Fabric and Management WG Davide Salomoni NIKHEF Lyon, ARM-3 –"

Similar presentations


Ads by Google