Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal.

Slides:



Advertisements
Similar presentations
DPM Monitoring Wahid Bhimji University of Edinburgh, Apr-101Wahid Bhimji – Files access.
Advertisements

Storage Workshop Summary Wahid Bhimji University Of Edinburgh On behalf all of the participants…
Metadata Progress GridPP18 20 March 2007 Mike Kenyon.
Wahid Bhimji SRM; FTS3; xrootd; DPM collaborations; cluster filesystems.
DPM Basics and its status and plans Wahid Bhimji University of Edinburgh GridPP Storage Workshop – Apr 2010 Apr-101Wahid Bhimji – DPM.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.
Storage: Futures Flavia Donno CERN/IT WLCG Grid Deployment Board, CERN 8 October 2008.
DPM Italian sites and EPEL testbed in Italy Alessandro De Salvo (INFN, Roma1), Alessandra Doria (INFN, Napoli), Elisabetta Vilucchi (INFN, Laboratori Nazionali.
Testing as a Service with HammerCloud Ramón Medrano Llamas CERN, IT-SDC
Wahid Bhimji University of Edinburgh P. Clark, M. Doidge, M. P. Hellmich, S. Skipsey and I. Vukotic 1.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Filesytems and file access Wahid Bhimji University of Edinburgh, Sam Skipsey, Chris Walker …. Apr-101Wahid Bhimji – Files access.
Wahid Bhimji University of Edinburgh J. Cranshaw, P. van Gemmeren, D. Malon, R. D. Schaffer, and I. Vukotic On behalf of the ATLAS collaboration CHEP 2012.
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
F. Brasolin / A. De Salvo – The ATLAS benchmark suite – May, Benchmarking ATLAS applications Franco Brasolin - INFN Bologna - Alessandro.
SRM 2.2: status of the implementations and GSSD 6 th March 2007 Flavia Donno, Maarten Litmaath INFN and IT/GD, CERN.
Your university or experiment logo here NextGen Storage Shaun de Witt (STFC) With Contributions from: James Adams, Rob Appleyard, Ian Collier, Brian Davies,
PhysX CoE: LHC Data-intensive workflows and data- management Wahid Bhimji, Pete Clarke, Andrew Washbrook – Edinburgh And other CoE WP4 people…
Nick Brook Current status Future Collaboration Plans Future UK plans.
Storage Wahid Bhimji DPM Collaboration : Tasks. Xrootd: Status; Using for Tier2 reading from “Tier3”; Server data mining.
Status of the production and news about Nagios ALICE TF Meeting 22/07/2010.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Support in setting up a non-grid Atlas Tier 3 Doug Benjamin Duke University.
Your university or experiment logo here Storage and Data Management - Background Jens Jensen, STFC.
Wahid, Sam, Alastair. Now installed on production storage Edinburgh: srm.glite.ecdf.ed.ac.uk  Local and global redir work (port open) e.g. root://srm.glite.ecdf.ed.ac.uk//atlas/dq2/mc12_8TeV/NTUP_SMWZ/e1242_a159_a165_r3549_p1067/mc1.
Architecture and ATLAS Western Tier 2 Wei Yang ATLAS Western Tier 2 User Forum meeting SLAC April
Optimisation of Grid Enabled Storage at Small Sites Jamie K. Ferguson University of Glasgow – Jamie K. Ferguson – University.
OSG Tier 3 support Marco Mambelli - OSG Tier 3 Dan Fraser - OSG Tier 3 liaison Tanya Levshina - OSG.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
INFSO-RI Enabling Grids for E-sciencE Enabling Grids for E-sciencE Pre-GDB Storage Classes summary of discussions Flavia Donno Pre-GDB.
Storage Federations and FAX (the ATLAS Federation) Wahid Bhimji University of Edinburgh.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT DPM / LFC and FTS news Ricardo Rocha ( on behalf of the IT/GT/DMS.
DPM Python tools Ivan Calvet IT/SDC-ID DPM Workshop 10 th October 2014.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Storage Interfaces Introduction Wahid Bhimji University of Edinburgh Based on previous discussions with Working Group: (Brian Bockelman, Simone Campana,
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
GridPP storage status update Joint GridPP Board Deployment User Experiment Update Support Team, Imperial 12 July 2007,
Storage Interfaces and Access pre-GDB Wahid Bhimji University of Edinburgh On behalf of all those who participated.
FTS monitoring work WLCG service reliability workshop November 2007 Alexander Uzhinskiy Andrey Nechaevskiy.
T3g software services Outline of the T3g Components R. Yoshida (ANL)
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
The GridPP DIRAC project DIRAC for non-LHC communities.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Andrea Manzi CERN EGI Conference on Challenges and Solutions for Big Data Processing on cloud 24/09/2014 Storage Management Overview 1 24/09/2014.
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
ATLAS TIER3 in Valencia Santiago González de la Hoz IFIC – Instituto de Física Corpuscular (Valencia)
The HEPiX IPv6 Working Group David Kelsey (STFC-RAL) EGI OMB 19 Dec 2013.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
New Features of Xrootd SE Wei Yang US ATLAS Tier 2/Tier 3 meeting, University of Texas, Arlington,
CMS data access Artem Trunov. CMS site roles Tier0 –Initial reconstruction –Archive RAW + REC from first reconstruction –Analysis, detector studies, etc.
Federating Data in the ALICE Experiment
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
Atlas IO improvements and Future prospects
Status of BESIII Distributed Computing
GFAL 2.0 Devresse Adrien CERN lcgutil team
Ákos Frohner EGEE'08 September 2008
Discussions on group meeting
Australia Site Report Sean Crosby DPM Workshop – 13 December 2013.
The LHCb Computing Data Challenge DC06
Presentation transcript:

Testing Infrastructure Wahid Bhimji Sam Skipsey Intro: what to test Existing testing frameworks A proposal

Introduction: What to test? DPM core release as well as any new features. Should be : Automatic Use production environments Use real workloads Allow for stress testing

Features to test WebDav – As local access protocol – For WAN transfers NFS v4.1 Xrootd – including redirection Everything else in BetaBeta – E.g. DPM Nagios and testsuites themselves The “basics” – i.e. what currently works /used

Some already available tools / tests/ frameworks Hammercloud – Used now for both stress testing site and blacklisting – Range of realistic user workflows + stats such as cpu eff and ev rate – Generally requires experiment software ATLAS HC IO tests : (more info later) ATLAS HC IO tests – Uses HC framework to submit arbitrary tests and collect more stats HCInABox : HCInABox – My own tar ball of similar tests – not supported in any way Perfsuite / nagios tests – Contains low level test of all required operations – Also has a version of a root read test put in by Martin

Structure 5 Hammercloud Oracle Db SVN Test code script Set release Datasets,… SVN Test code script Set release Datasets,… Uploads stats Currently A standard Atlas dataset (DPD/AOD/ESD) “preloaded” to sites Use both ROOT from ATLAS release and HEAD Regularly submitting single tests Sites Data mining tools Command line, Web interface, Root scripts Data mining tools Command line, Web interface, Root scripts ROOT source (via curl) New dataset ATLAS HC IO Tests

Atlas HC IO Tests 1.ROOT based reading of DPD (or AOD): – Somewhat like an ATLAS DPD analysis – Provides metrics from ROOT (no. of reads/ speed) – Happy to add detailed DPM metrics (e.g. parsing text file that comes from client when RFIO_TRACE set) 2.Download latest ROOT version and use – Write a new file and then read it back 3.Athena (Atlas framework) D3PD making 4.“Realistic analysis” test – Example physics code from a “workbook” 6

A few plots

“ROOT reads”

HCInABox example Same test as (1) Here run locally on a test disk server (from Dell).Dell Can artificially create load seen in production E.g. submit to batch 100 simultaneous jobs direct rfio reads against 1 filesystem 128k Rfio buffersize512k Rfio buffersize Test also in DPM Perfsuite

Just to mention: different purposes A lot of the above created for different purposes to that needed here. E.g. for Site tuning Experiment applications and data models Vendor supplied storage Middleware, protocol comparisons But still can be useful….

A proposal: Volunteer sites Install a test DPM headnode and disk server: – Auto-update from a test repo Runs DPM nagios / perfsuite tests Regular job running using the HC IO system: – Test runs on production cluster /read from test dpm – Easy (ish) for atlas sites/ not sure about others – Some config. work: test “site” in atlas or hacks in job script Can be augmented by special stress tests submitted locally or via hammercloud

DPM Knowledge Base Wahid Bhimji

A large community to tap into…. SRM typenumber Bestman43 Castor19 dCache80 DPM250 hdfs1 xrootd3 StoRM54 SRM endpoints from BDII SRC: d.ac.uk/~wbhimji/S RMMonitoring/

Various existing lists; wikis; blogs DPM Trac DPM Trac dpm-users-forum Dpm-contrib: currently only contains toolkit (see talk by Sam earlier) Dpm-contrib GridPP: – Storage list; blog; wiki and weekly meeting Storage listblogwiki weekly meeting – Individual site blogs: e.g. Scotgrid and NorthgridScotgrid Northgrid Recently DPM webinarswebinars

Some proposals… Aggregating blog articles ; observations ; wikis: – I’m not the best person to do this (!) Host code snippets, poorly tested tools etc. Contributions from community into core DPM – Developers visiting CERN or working at “home” More of these workshops (!) at other locations (?)