CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN.

Slides:



Advertisements
Similar presentations
4 th DataGRID Project Conference, Paris, 5 March 2002 Testbed Software Test Plan I. Mandjavidze on behalf of L. Bobelin – CS SI; F.Etienne, E. Fede – CPPM;
Advertisements

EGEE-II INFSO-RI Enabling Grids for E-sciencE The gLite middleware distribution OSG Consortium Meeting Seattle,
CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Plateforme de Calcul pour les Sciences du Vivant SRB & gLite V. Breton.
CMS Applications Towards Requirements for Data Processing and Analysis on the Open Science Grid Greg Graham FNAL CD/CMS for OSG Deployment 16-Dec-2004.
Development of test suites for the certification of EGEE-II Grid middleware Task 2: The development of testing procedures focused on special details of.
Andrew McNab - EDG Access Control - 14 Jan 2003 EU DataGrid security with GSI and Globus Andrew McNab University of Manchester
Andrew McNab - Manchester HEP - 6 November Old version of website was maintained from Unix command line => needed (gsi)ssh access.
GRID Workload Management System Massimo Sgaravatto INFN Padova.
Status of Globus activities within INFN (update) Massimo Sgaravatto INFN Padova for the INFN Globus group
Introduction to Software Testing
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Open Science Grid Software Stack, Virtual Data Toolkit and Interoperability Activities D. Olson, LBNL for the OSG International.
EGEE is a project funded by the European Union under contract IST JRA1 Testing Activity: Status and Plans Leanne Guy EGEE Middleware Testing.
UCL workshop – 4-5 March 2004 – HEP Assessment of EDG – n° 1 HEP Applications Evaluation of the EDG Testbed and Middleware Stephen Burke (EDG HEP Applications.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
Nightly Releases and Testing Alexander Undrus Atlas SW week, May
CERN LCG-1 Status and Issues Ian Neilson for LCG Deployment Group CERN Hepix 2003, Vancouver.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
LCG and HEPiX Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002.
EGEE is a project funded by the European Union under contract IST Testing processes Leanne Guy Testing activity manager JRA1 All hands meeting,
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Workload Management Massimo Sgaravatto INFN Padova.
C. Loomis – Integration Status- October 31, n° 1 Integration Status October 31, 2001
First attempt for validating/testing Testbed 1 Globus and middleware services WP6 Meeting, December 2001 Flavia Donno, Marco Serra for IT and WPs.
INFSO-RI Enabling Grids for E-sciencE Integration and Testing, SA3 Markus Schulz CERN IT JRA1 All-Hands Meeting 22 nd - 24 nd March.
GLite – An Outsider’s View Stephen Burke RAL. January 31 st 2005gLite overview Introduction A personal view of the current situation –Asked to be provocative!
DataGRID WPMM, Geneve, 17th June 2002 Testbed Software Test Group work status for 1.2 release Andrea Formica on behalf of Test Group.
JRA Execution Plan 13 January JRA1 Execution Plan Frédéric Hemmer EGEE Middleware Manager EGEE is proposed as a project funded by the European.
Preparation for Integration Organized access to the code WP6 infrastructure (MDS-2, RC, …) Input from WPs on requirements,... Acquire experience with Globus.
LCG EGEE is a project funded by the European Union under contract IST LCG PEB, 7 th June 2004 Prototype Middleware Status Update Frédéric Hemmer.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
DataGRID PTB, Geneve, 10 April 2002 Testbed Software Test Plan Status Laurent Bobelin on behalf of Test Group.
EGEE is a project funded by the European Union under contract IST EGEE Services Ian Bird SA1 Manager Cork Meeting, April
Barcelona Meeting 14/05/03 Our activities & concerns EDG-LCG transition/convergence plan EDG side : integration of new middleware, testbed evaluation…
Site Management & Support for 1.2 Technical Problems Other Problems What is a Sysadmin? What is wrong with the way we work? Me as a Site Manger Me as an.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
WP3 Information and Monitoring Rob Byrom / WP3
G.Govi CERN/IT-DB 1 September 26, 2003 POOL Integration, Testing and Release Procedure Integration  Packages structure  External dependencies  Configuration.
European Middleware Initiative (EMI) The Software Engineering Model Alberto Di Meglio (CERN) Interim Project Director.
LCG CERN David Foster LCG WP4 Meeting 20 th June 2002 LCG Project Status WP4 Meeting Presentation David Foster IT/LCG 20 June 2002.
Summary from WP 1 Parallel Section Massimo Sgaravatto INFN Padova.
CERN LCG Deployment Overview Ian Bird CERN IT/GD LCG Internal Review November 2003.
Status of Globus activities Massimo Sgaravatto INFN Padova for the INFN Globus group
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
VOX Project Status T. Levshina. 5/7/2003LCG SEC meetings2 Goals, team and collaborators Purpose: To facilitate the remote participation of US based physicists.
CERN Status of Deployment & tasks Ian Bird LCG & IT Division, CERN GDB – FNAL 9 October 2003.
DataTAG is a project funded by the European Union International School on Grid Computing, 23 Jul 2003 – n o 1 GridICE The eyes of the grid PART I. Introduction.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations: Evolution of the Role of.
Bob Jones – Project Architecture - 1 March n° 1 Project Architecture, Middleware and Delivery Schedule Bob Jones Technical Coordinator, WP12, CERN.
Jean-Philippe Baud, IT-GD, CERN November 2007
Bob Jones EGEE Technical Director
Status of Task Forces Ian Bird GDB 8 May 2003.
Regional Operations Centres Core infrastructure Centres
EGEE Middleware Activities Overview
U.S. ATLAS Grid Production Experience
Grid Deployment Area Status Report
Testbed Software Test Plan Status
T-StoRM: a StoRM testing framework
WP1 activity, achievements and plans
LCG middleware and LHC experiments ARDA project
Francesco Giacomini – INFN JRA1 All-Hands Nikhef, February 2008
LCG experience in Integrating Grid Toolkits
Leigh Grundhoefer Indiana University
Ian Bird LCG Project - CERN HEPiX - FNAL 25-Oct-2002
gLite The EGEE Middleware Distribution
Presentation transcript:

CERN Certification & Testing LCG Certification & Testing Team (C&T Team) Marco Serra - CERN / INFN Zdenek Sekera - CERN

CERN LCG Software LCG software is: –Globus (subset of the VDT distribution) delivered by iVDGL –EDG WP1 (Workload Management System - “Resource Broker”) –EDG WP2 (Data Management System) –Several components from other EDG WPs (config objects, InfoProviders, …) –GLUE 1.1 (Information schema) –MDS based Information System (Globus) with LCG enhancements –“StorageElement” (disk-based only, gridFTP), MSS via GFAL and SRM soon –LCG modifications and developments: Job managers to avoid shared file-system problems MDS – BDII LDAP Globus gatekeeper enhancements (adding some accounting and auditing features, log rotation, that LCG requires) Many, many bug fixes to EDG and Globus/VDT GFAL lib to access MSS

CERN Certification Process Scope the software that LCG is deploying has never been used in a large scale production system! the goal of the certification process is to provide LCG with production quality software satisfying experiment requirements “production quality “: stability, robustness, avalability 24h x 7d performance, scalability, gracefully degrade operability maintainability user compliant

CERN Certification Process feature testing –Workload Management System(WMS), Data Management (DMS), Information System (IS),..... different grid architectures / configurations –simulating the production service stress tests –single components, overall system destructive tests - error recovery –injecting problems to study system behaviour security –basic issues

CERN Running the Certification 2 major activities: integrating components into LCG software –verifying its consistency (comes from several sources) –defining packaging and installation procedures –first level of debugging, testing of bug fixes C&T-testbed is where LCG release is built running a matrix of tests to cover all the relevant items –functionalities, stress tests, security changing the C&T-testbed setup –different architectures, destructive tests

CERN Certification is an Iterative Process Certification testbedDeployment GD C&T section add features fix problems transmit problems EDG fix problems new releases VDT fix problems new releases Integrate Basic Functionality Tests errors? yes no yes fix problems Run C&T test suites site test suites no Run Certification Matrix errors? yes Run Loose Cannons & Experiments Release Candidate errors? no yes no fix problems release candidate not acceptable RELEASE certified release Feedback from Deployment

CERN C&T - Testbed workload management data management information system User Interface cluster 3 LSF cluster 2 Condor cluster 1 PBS cluster Budapest cluster Taipei cluster Wisconsin simulating the real service

CERN

CERN Certification Matrix (main items) globus –main functionalities, collaborative activity ongoing with VDT test team workload management system –load distribution, resources saturation,.... data management system –data access, replication, catalog consistency,... security –certificates,.... services interaction –jobs with input data, jobs with MSS access,.... –different batch systems (OpenPBS, LSF, Condor) Grid Unit Testing Grid Services Testing

CERN Test Results (examples) job submission tests –various and complex tests, success rate ~97% –~1000 jobs in the system for 2 days: ok –~1000 jobs to a single cluster (ComputingElement): ok –resources fully utilized –load correctly distributed: is function of CPUs available in each cluster... data management tests –different protocols, replicating small/big data files: ok –single stream (~2000 files), multiple streams (~750 files): ok long jobs (running > 24h) – testing proxy renewal : ok guiding jobs to data –to check that jobs are dispatched only to clusters that allow access to specific files with a specific protocol: ok crash of services strongly reduced after the certification & debugging process

CERN LCG Test Suite LCG has produced a “test suite” to allow for: –misconfiguration spotting –automated test procedure –interactive & nightly tests –performances evaluation, stress test –statistics about problems ultimate goal: regression testing for middleware validation a subset is the “site certification suite” –core functionalities –not exhaustive.... test suite is continuously updated to reflect new issues simultaneously we test the monitoring system

CERN LCG Test Suite (2)

CERN C&T-Testbed Usage a testbed is required for many essential tasks –testing –certification –integrating the release –experiment testing –problem resolution of the deployed system contradictory activities –careful management required to maximize efficiency –fast changing environment (upgrades, different tests, bug fixing,....) we would like to separate them into different testbeds –example: experiments testing could be indipendent from other activities

CERN LCG-1 Lessons (C&T-team items) packaging/installation is an issue –different formats from different sources given to us configuration is too complex –too many interdependencies between different services –many parameters hardcoded in the configuration files testing is a huge issue when technology is still under (heavy) developement –around 200 bugs(!) open in ~7 months from C&T-team alone –architecture, interoperability, we needed to form an internal team for fast bug fixing –not forseen, the process within the mw projects was too slow  costly.... (3 people)

CERN Sep/1 Oct/1Nov/1 Nov/30 LCG-2 release LCG-1 deployment LCG-1 C&T LCG-2 C&T LCG-1 C&T Big C&T Small C&T Sep/25 LCG-2 deployment possible Sep/20 (LCG-1 C&T extension) Globus study LCG-1 upgrade tag Site + C&T test suites GFAL Experiments testing Oct/10Oct/20 LCG-2 code cutoff Nov/20 Timeline to LCG-2

CERN Summary middleware delivery to LCG was late –first reasonable set of middleware on the C&T test-bed end of July –short time to turn development software into production software –still a lot to do certification process for middleware in place with a demonstrable value –proven with LCG-1 open issues –software is not at production level yet –performances –configuration complexity –site verification is not exhaustive in progress –operating procedures –scalability tests with complex jobs –simulating experiment production behaviour