EGEE-II INFSO-RI-031688 Enabling Grids for E-sciencE www.eu-egee.org Postmortem gLite-3.0.0 postmortem an examination of a dead body to determine the cause.

Slides:



Advertisements
Similar presentations
INFSO-RI Enabling Grids for E-sciencE Update on LCG/EGEE Security Policy and Procedures David Kelsey, CCLRC/RAL, UK
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks gLite Release Process Maria Alandes Pradillo.
EGEE-II INFSO-RI Enabling Grids for E-sciencE NA2 Tools & Services Owen Appleton, NA2 Deputy Manager EGEE/EGEE-II Transition meeting.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks From ROCs to NGIs The pole1 and pole 2 people.
Robert Sopko Stephen Miller Amy Gandhi Jazimar Bailey.
INFSO-RI Enabling Grids for E-sciencE The gLite Software Development Process Alberto Di Meglio CERN.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What GGUS can do for you JRA1 All hands.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Steven Newhouse EGEE’s plans for transition.
Help Yourself.  FAQ  Message Board  Bug Tracker  Call Tracking System  Web Site.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks LHCOPN operations Presentation and training.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks PPS All sites Meeting: Introduction & Agenda.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROD model assessment ROC UKI John Walsh.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks NA3 Resources Robin McConnell.
INFSO-RI Enabling Grids for E-sciencE SA1 and gLite: Test, Certification and Pre-production Nick Thackray SA1, CERN.
INFSO-RI Enabling Grids for E-sciencE Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Multi-level monitoring - an overview James.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Report from GGUS BoF Session at the WLCG.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Stephen Childs Trinity College Dublin &
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
ATLAS Experience with GGUS Guido Negri INFN – Milano Italy.
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Antonio Retico CERN, Geneva 19 Jan 2009 PPS in EGEEIII: Some Points.
8 th CIC on Duty meeting Krakow /2006 Enabling Grids for E-sciencE Feedback from SEE first COD shift Emanoil Atanassov Todor Gurov.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Regional Dashboard Cyril L’Orphelin - CNRS/IN2P3.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC Security Contacts R. Rumler Lyon/Villeurbanne.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The future of the gLite release process Oliver.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Progress on first user scenarios Stephen.
Documentation (& User Support) Issues Stephen Burke RAL DB, Imperial, 12 th July 2007.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Task tracking SA3 All Hands Meeting Prague.
WLCG Software Lifecycle First ideas for a post EMI approach 0.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Standard network trouble tickets exchange.
Last update 29/01/ :01 LCG 1Maria Dimou- cern-it-gd Maria Dimou IT/GD CERN VOMS server deployment LCG Grid Deployment Board
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Communication tools between Grid Virtual.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Operations procedures: summary for round table Maite Barroso OCC, CERN
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA3 partner collaboration tasks & process.
EGEE-III INFSO-RI Enabling Grids for E-sciencE SA3 All Hands Meeting 'Cluster of Competence' Experience SA3 INFN Cyprus May 7th-8th.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Middleware Update Maria Alandes Pradillo.
EGEE-II INFSO-RI Enabling Grids for E-sciencE gLite Markus Schulz CERN-IT-GD-ITR EGEE-II SA3.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Developing YAIM modules for Sun Grid Engine.
INFSO-RI Enabling Grids for E-sciencE gLite Certification and Deployment Process Markus Schulz, SA1, CERN EGEE 1 st EU Review 9-11/02/2005.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Patch Preparation SA3 All Hands Meeting.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks MSA3.4.1 “The process document” Oliver Keeble.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
Operations model Maite Barroso, CERN On behalf of EGEE operations WLCG Service Workshop 11/02/2006.
INFSO-RI Enabling Grids for E-sciencE Operations Parallel Session Summary Markus Schulz CERN IT/GD Joint OSG and EGEE Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Pole 2 : Restructuration of the OPS Manual.
INFSO-RI Enabling Grids for E-sciencE Upcoming Releases Markus Schulz CERN SA1 15 th June 2005.
EGEE-III INFSO-RI Enabling Grids for E-sciencE JRA1 and SA3 All Hands Meeting December 2009, CERN, Geneva Product Teams –
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
JRA1 Meeting – 09/02/ Software Configuration Management and Integration EGEE is proposed as a project funded by the European Union under contract.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks What all NGIs need to do: Helpdesk / User.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Best Practices and Use cases David Bouvet,
EMI INFSO-RI Testbed for project continuous Integration Danilo Dongiovanni (INFN-CNAF) -SA2.6 Task Leader Jozef Cernak(UPJŠ, Kosice, Slovakia)
INFSO-RI Enabling Grids for E-sciencE Software Process Author: Laurence Field (CERN) Presented by David Smith JRA1 All Hands meeting,
II EGEE conference Den Haag November, ROC-CIC status in Italy
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Grid is a Bazaar of Resource Providers and.
GGUS summary (3 weeks) VOUserTeamAlarmTotal ALICE7029 ATLAS CMS LHCb Totals
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks ROC model assessment AP ROC ShuTing Liao.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CYFRONET site report Marcin Radecki CYFRONET.
Testing under the Agile Method CSCI 521 Software Project Management based on the book Testing Extreme Programming by Lisa Crispin and Tip House.
EGEE-III INFSO-RI Enabling Grids for E-sciencE COD EGEE09 Barcelona Pole-2 Restructuring of Procedures Vera Hansper.
Quality Control in the dCache team.
Presentation transcript:

EGEE-II INFSO-RI Enabling Grids for E-sciencE Postmortem gLite postmortem an examination of a dead body to determine the cause of death. an analysis or discussion of an event held soon after it has occurred, esp. in order to determine why it was a failure : an election postmortem on why the party lost. postmortem an examination of a dead body to determine the cause of death. an analysis or discussion of an event held soon after it has occurred, esp. in order to determine why it was a failure : an election postmortem on why the party lost.

EGEE-II INFSO-RI Enabling Grids for E-sciencE Bug-Issue-State-Progress Tracking (before and after the release)

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 3 Tracking During release preparation: –Savannah  LCG components Release relevant and general problems mixed  gLite components Release relevant and general problems mixed –Pre-production mailing list –Open issues via Wiki Page  Here issues are jointly listed  bug numbers referenced  Status changes manually updated -----> consistency with bug trackers????? –Test status  Some components via Web result page  Some components via messages (works/doesn’t)  Impossible to link test results with status of release  Situation improved with additional Wiki table for test tracking Manually updated (trust?)

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 4 Tracking II During Rollout: –Savannah  LCG components Release relevant and general problems mixed  gLite components Release relevant and general problems mixed –GGUS ticketing system  Install problems  User problems  Software problems  Configuration problems  Answered or transferred to Savannah Who closes when what? –Mailing lists & private communication  We did not manage to stick to the rule “If it is not in a tracker it is not a problem” Never a clear, visible status of the problems > Doubt, Rumours, ….

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 5 Tracking III In depth research needed to answer questions like this: –Which bugs are fixed in gLite RC2? –Is bug #1216 fixed in gLite RC2? –Which tag version/RPM version fixes bug #1786? –New bug opened during certification outside the cert process  Does this affect the version on the certification testbed? Some changes underway: Merging all Savannah tracker –With additional information –Need for Savannah “data mining tools”  Security issues (all DB tables of all project owned by one user) New integrated process (in progress) GGUS ticketing not resolved

EGEE-II INFSO-RI Enabling Grids for E-sciencE Communication (?)

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 7 Communication Project management -> release team –Release team was aware of the target release date (May 1st) and derived dates  -6 weeks pre-pro (was missed!!!) –End user documentation team assumed 1st of June Project management -> experiments –Different views in the experiments about what will be in the release and how long this can be negotiated  ATLAS ( gLite-CE not needed) –Release date clear Project management/ release team -> developers –Release date took several of the developers by surprise  Assumed 1st of June as closing date  Totally unaware of the date Release team -> ROCs –Pre-release before eastern was not seen as an indication that the release will come on time Release team/ ROCs -> sites –Most of the sites are not aware of the release schedule

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 8 Communication II System Elements: Release Coordinator Developers Integrators CertTestBed Manager(s) Testers Problem statement: All have to synchronize some of their activities with each other and some need to have an aggregated, up to date, status view Most problems need more than one iteration on each comm. channel “some of their activities” -----> 100% are 100% active > communication handshakes fail without being noticed for hours This was extremely expensive –Time –Source of conflicts This has to be more formalized (+ common sense)

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 9 Communication III Spreading the gospel (Announcing releases) –Shotgun approach  Rollout, glite-announce, ROCs, forward to all management lists … –Rollout list  a discussion forum where all kinds of interesting ideas are floated  new releases are treated like this  frequently ignored/ seen as an recreational activity –Release pages  Static, only visited when people got the signal –ROCs  Don’t reach the extension projects  Different speed limits Need to use gLite-announce as a one way channel Convert rollout to a discussion list/ newsgroup Maintain an interesting news page

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 10 Communication IV Rumors and scaremongering –A lot of time and energy is devoted to verify or falsify rumors –Constant review mode for SA3/SA1 Bashing CERN-IT based activities and staff is widespread –Samples:  No gLite-CE ever worked  gLite-3-0 was release without testing  ……. –This has a severe effect for the “moral” in the trenches –People try to avoid making mistakes by all means > productivity meltdown We have to report openly and truthfully (and …. maybe we have to bite back a bit more)

EGEE-II INFSO-RI Enabling Grids for E-sciencE Documentation

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 12 Documentation Problems –Too much –Not enough –New and old –Quality –Hard to find –Wrong/outdated –Too aggregated –No “one stop” shop –Branding –Hard to maintain –Documentation from partners not linked –Not read

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 13 Documentation Too much –We basically provided LCG gLite gLite-3.0 specific –Separated already information on non included components Not enough –Material on boot strapping a site has to be added again  Was there in early LCG-2 releases  What is a site, central services etc.  Which notes, hardware requirements, how to start New and old –Site managers get confused by finding different instructions for the same components ( tarball UI) Quality –Need to proof read documentation and check links before release –Wiki material has to be reviewed and removed/updated

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 14 Documentation Hard to find –Documentation is scattered between several pages –Several Wiki sites with information –Guide to documentation needed  short summary for all documents Wrong/outdated –See “quality” –Good example “Testing your site” –Review!!! Too aggregated –YAIM description and install guide is mixed No “one stop” shop –We need a high level intro Branding –No uniform naming of documents/ services –No classification of documents –No common appearance

Enabling Grids for E-sciencE EGEE-II INFSO-RI POSTMORTEM 15 Documentation Hard to maintain –Some of the documentation can be only build by an elite –Can have site effects Documentation from partners not linked –Example:  Material that covers other batch systems  Special setups via NAT Not read –Especially the “Release Notes” are not read by half of the sites