Report on Hepix Spring 2005 Forschungszentrum Karlsruhe 9-13 May Storage and data management Batch scheduling workshop 12-13 May.

Slides:



Advertisements
Similar presentations
Batch System Operation & Interaction with the Grid LCG/EGEE Operations Workshop May 25 th 2005 CERN.ch.
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
1 User Analysis Workgroup Update  All four experiments gave input by mid December  ALICE by document and links  Very independent.
Scheduling under LCG at RAL UK HEP Sysman, Manchester 11th November 2004 Steve Traylen
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Services Abderrahman El Kharrim
A conceptual model of grid resources and services Authors: Sergio Andreozzi Massimo Sgaravatto Cristina Vistoli Presenter: Sergio Andreozzi INFN-CNAF Bologna.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
Workload Management Massimo Sgaravatto INFN Padova.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Accounting Update Dave Kant Grid Deployment Board Nov 2007.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
A proposal for standardizing the working environment for a LCG/EGEE job David Bouvet - Grid Computing team - CCIN2P3 HEPIX Karlsruhe 13/05/2005.
Grid Canada CLS eScience Workshop 21 st November, 2005.
INFSO-RI Enabling Grids for E-sciencE gLite Data Management Services - Overview Mike Mineter National e-Science Centre, Edinburgh.
OSG Middleware Roadmap Rob Gardner University of Chicago OSG / EGEE Operations Workshop CERN June 19-20, 2006.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Computational grids and grids projects DSS,
:: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: ::::: :: GridKA School 2009 MPI on Grids 1 MPI On Grids September 3 rd, GridKA School 2009.
Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
11/30/2007 Overview of operations at CC-IN2P3 Exploitation team Reported by Philippe Olivero.
Grid Workload Management Massimo Sgaravatto INFN Padova.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
Maarten Litmaath (CERN), GDB meeting, CERN, 2006/02/08 VOMS deployment Extent of VOMS usage in LCG-2 –Node types gLite 3.0 Issues Conclusions.
June 24-25, 2008 Regional Grid Training, University of Belgrade, Serbia Introduction to gLite gLite Basic Services Antun Balaž SCL, Institute of Physics.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Overview of grid activities in France in relation to FKPPL FKPPL Workshop Thursday February 26th, 2009 Dominique Boutigny.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
1 User Analysis Workgroup Discussion  Understand and document analysis models  Best in a way that allows to compare them easily.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Conference name Company name INFSOM-RI Speaker name The ETICS Job management architecture EGEE ‘08 Istanbul, September 25 th 2008 Valerio Venturi.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Derek Ross E-Science Department DCache Deployment at Tier1A UK HEP Sysman April 2005.
BNL Service Challenge 3 Status Report Xin Zhao, Zhenping Liu, Wensheng Deng, Razvan Popescu, Dantong Yu and Bruce Gibbard USATLAS Computing Facility Brookhaven.
SAM Sensors & Tests Judit Novak CERN IT/GD SAM Review I. 21. May 2007, CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
RHIC/US ATLAS Tier 1 Computing Facility Site Report Christopher Hollowell Physics Department Brookhaven National Laboratory HEPiX Upton,
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
LCG Service Challenges SC2 Goals Jamie Shiers, CERN-IT-GD 24 February 2005.
Grid Workload Management (WP 1) Massimo Sgaravatto INFN Padova.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Enabling Grids for E-sciencE INFSO-RI Enabling Grids for E-sciencE Gavin McCance GDB – 6 June 2007 FTS 2.0 deployment and testing.
Distributed Analysis Tutorial Dietrich Liko. Overview  Three grid flavors in ATLAS EGEE OSG Nordugrid  Distributed Analysis Activities GANGA/LCG PANDA/OSG.
INFSO-RI Enabling Grids for E-sciencE gLite Test and Certification Effort Nick Thackray CERN.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
LHCC Referees Meeting – 28 June LCG-2 Data Management Planning Ian Bird LHCC Referees Meeting 28 th June 2004.
WLCG Operations Coordination report Maria Alandes, Andrea Sciabà IT-SDC On behalf of the WLCG Operations Coordination team GDB 9 th April 2014.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Enabling Grids for E-sciencE Claudio Cherubino INFN DGAS (Distributed Grid Accounting System)
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
Jean-Philippe Baud, IT-GD, CERN November 2007
“A Data Movement Service for the LHC”
David Bouvet Fabio Hernandez IN2P3 Computing Centre - Lyon
Developments in Batch and the Grid
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

Report on Hepix Spring 2005 Forschungszentrum Karlsruhe 9-13 May Storage and data management Batch scheduling workshop May

Storage and data management (1) 5 Miscellaneous talks then a session on Grid service challenges and new software from GD group ENSTORE at FNAL: –Overview of data management at FNAL. Nothing new except need to checksum at all stages of file movement. Several (corrected) failures per week. Performance results on Panasas, Lustre and AFS filesystems from CASPUR (Rome): –Borrowed NAS switches, data direct, infortrend and panasas (intelligent) disk trays. –Compared I/O performance for different hardware and file systems. Ranged from 350 MB/s for AFS to 800 MB/s for Panasas. –Different HEP workload types were also tested. –Prices now 2 to 4 Euro per GB for good performance. –Comprehensive results – to be looked at.

Storage and data management (2) Ibrix Fusion Filesystem for US CMS at FNAL: –Evaluation started autumn 2002 for shared highly reliable ‘user disk’ space for US CMS for code, work areas and mini-DSTs. –Chose IBRIX, a commercial software solution. Can use IBRIX clients or NFS mounts. –Had major stability problems for 12 months but now in use. Will grow to 30 TB. Xrootd Infrastructure at RAL: –Extended Root daemon developed at SLAC/INFN. –Single name space. –Client connections load-balanced redirection to servers hosting data. –Very thorough failover architecture including for open files. FNAL SATA disk experiences: –Does not give the same performance as more expensive architectures. –Evaluate the total cost of ownership and carefully select vendors. –You get what you pay for !

Grid Service Challenges (1) US Atlas Tier 1 Facility at BNL planning for SC3: –Lessons from SC2 include need for multiple tcp streams and parallel file transfers to fill up network pipe –Found sluggish parallel i/o with ext2 and ext3 filesystems. Xfs was better. –Goal is 150 MB/sec from CERN to disk, 60 MB/s to tape –Use dcache in front of HPSS. Will need more than 2 tape drives. –Will select small number (2) of tier2 sites for 1-way send transfers LCG Tier 2 at LAL/Dapnia: –Building a tier2 facility for simulation and analysis with LAL (Orsay), DAPNIA (Saclay) and LPNHE(Paris). Investing 1.7ME up to 2007 in 1500 kSi2K cpu and 350TB disk. –Efficient use and management of storage seen as main challenge –Will participate in SC3 (no details)

Grid Service Challenges (2) Ramping up the LCG Service: –Jamie Shiers talk well delivered by Sophie Lemaitre. –SC2 met its throughput goals with more than 600MB/s sustained for 10 days and with more sites than planned. Still cannot be called a service. –For SC3 need gLite file transfer software and SRM service to be widely deployed.

Service Challenge 3 - Phases High level view: Setup phase (includes Throughput Test) –2 weeks sustained in July 2005 “Obvious target” – GDB of July 20 th –Primary goals: 150MB/s disk – disk to Tier1s; 60MB/s disk (T0) – tape (T1s) –Secondary goals: Include a few named T2 sites (T2 -> T1 transfers)  Encourage remaining T1s to start disk – disk transfers Service phase –September – end 2005 Start with ALICE & CMS, add ATLAS and LHCb October/November All offline use cases except for analysis More components: WMS, VOMS, catalogs, experiment-specific solutions –Implies production setup (CE, SE, …)

New Software from GD group Updates on talks already given to recent GD workshop LHC File Catalog (J-P.Baud): – replaces EDG Replica Location Service –fixes scalability and performance problems found in RLS gLite File Transfer Service (S.Lemaitre): –Only functionality is file transfer –Will be distributed with LCG but can run stand-alone Lightweight Diskpool Manager (J-P.B): –Similar to dcache but much easier to install and configure –Thoroughly tested –Intended for tier2 sites to satisfy the gLite requirement for an SRM (Storage Resource Manager) interface.

Batch scheduling workshop Aim: to enhance communications between users and developers of local resource scheduling (LRS) systems and Grid-level resource scheduling. HEP sites use various systems giving fine-grained control over heterogeneous local systems and applying local policies. Grid scheduling often assumes sites are homogeneous and equally available to all virtual organisations (for HEP this means LHC experiments plus grid developers). Can Grid level scheduling reflect local scheduling and if not what should sites do ?

Sessions Thursday morning: How are local batch schedulers used at HEP sites. –there were 9 site reports and which answered this question. Thursday afternoon: Local and grid schedulers: Status and plans. –There were two reports on site views of their problems with the Grid-LRS interfaces. –Four commercial scheduler vendor presentations of their plans but not relating to any HEP grid activities. –An overview of Condor (the best supported model for EGEE LRS) –A report on an EGEE gLite interface to LRS. Friday morning: Developing a common batch interface. –A glue schema status and plans talk followed by a discussion. –A proposal to standardize sets of environment variables available to LRS related to local and Grid attributes.

What/How are local batch schedulers used at HEP sites. 4 Platform LSF: CERN, SLAC, JLAB,BNL 2 Sun Grid Engine: London e-Science, DESY 1 OpenPBS: JLAB Lattice QCD cluster (will drop LSF) 1 Torque (an OpenPBS variant) with maui scheduler: RAL FNAL home grown changing to Condor BNL changing from LSF to Condor IN2P3 home grown. All sites (very) heterogeneous hardware. All sites use (or are going to) same farms for grid and non- grid work. Most have local groups and grid VO allocations and use a fairshares mechanism. All sites have cpu-time based queues but allowing to specify other resources. Most common are work space and memory.

Local and grid schedulers: Status and plans (1) Personal ‘musings’ from Jeff Templon (Nikhef): –We need to be able to give local allocations among VO’s but allow them to use what the other does not. Also needs small number of high priority operations jobs to run and maybe cycle scavengers. –Users range from polite to sneaky. Some experiments over-submit pilot jobs (real job is in a DB) to many sites which then take many minutes scheduling large numbers of do-nothing jobs. –If we are only running one VO’s jobs then another VO would get the next free slot but how to publish this fact to the Grid ? –Efficient usage of local resources will need reasonable job run time estimates in normalised units to be attached to jobs. –We need self-disabling sites (or worker nodes at a site) to stop the problem of black-holes ‘eating’ job queues with serial failures.

Local and grid schedulers: Status and plans (2) BQS Problems and solutions for the Grid (IN2P3) –A well structured list of the problems currently seen with the Grid-BQS interfaces but common to most sites. Local solutions are mostly temporary pending hoped for Grid improvements.. –Request Broker does not pass important scheduling info to the Compute Element (BQS interface) such as requested cpu or memory. IN2P3 assume maximum resources are required but this leads to reduced overall efficiency. –Grid certificates map to local team accounts. Fast traceback to real user is needed for resolving problems. –Grid job stdout and stderr files are hidden under unique subdirectories making local problem debugging difficult. They request the RB to somehow indicate these names.

Local and grid schedulers: Status and plans (3) Platform LSF: –Current release tested to support 5000 execution hosts, jobs in queue and 100 users simultaneously using server. IBM LoadLeveller: –Main new feature is resource advance reservation. Mainly used for weather forecasting. Sun Grid Engine: –commercial product is now called N1 Grid Engine 6. Not clear if this is same as lines open source version. Also added advance reservation. PBSPro: –Also released advance reservation. Commercial vendors respond to the requirements of their (high paying) customers !

Local and grid schedulers: Status and plans (4) Condor and Grid challenges: –This was an overview of Condor and some of its plans. –Very rich R&D program with many components (9 million lines of code). –Well supported with 35 staff, many permanent. –Have released Condor-G which sits between user application and grid-middleware. Condor worker node pools are between middleware and local fabric. –Solutions to mismatch between local versus grid scheduler capabilities involved ‘gliding-in’ a Condor startd under the middleware or a whole new job-manager called stork however I could not relate resulting architecture diagram to our grid architecture. –Please read talk at –Followed by long rambling discussion

Local and grid schedulers: Status and plans (last) Last talk was BLAAHP: –EGEE Batch Local ASCII Helper from the EGEE J1 Group –A gLite component used by Condor-C and CREAM (?) to manage jobs on the batch systems. Could be easily interfaced to other systems. –Specifies 3 external scripts for local batch systems Submit job with parameters such as queue name – gets return status Query job status – must return that for running and finished jobs (for some time) and distinguish between successful and failed jobs Cancel job returning success or failure –To address overloading with status queries they have developed a cached server monitoring batch system job logs. –To address job proxy expiry they have developed a proxy receiver to be started by the job wrapper on each worker node which listens to receive an updated proxy from the CE. –This was an important talk but the only one from gLite. How does it relate to the glue schema (see next talk) and what does it imply for EGEE job submission User Interfaces and Resource Brokers ?

Developing a common batch interface. GLUE Schema:status and plans (Laurence Field) –Grid Laboratory Uniform Environment –Started 2002 with no real users. Design had no real VO concept. –Describes attributes and groupings that are published to the web. –Chooses CE (usually site) to run a job from best ETT (job estimated traversal time). ETT calculation needs improving. –Uses dynamic plugins to interface to each LRS. Asked for volunteer sites to support the dynamic plugins. CERN will do the LSF ones. –Next version will allow multiple VO’s per queue with separate ETT calculations. Still assumes homogenous hardware. –Work on a new version will start in Nov 2005 and it does not have to be backwards compatible – give your input ! –Observation: relationship with gLite is unclear to me. Learned later EGEE probably will support GLUE. There will be parallel edg and glite job management components that can coexist at a site (but maybe not on the same grid server hosts).

Common batch interface: some discussion points There are mismatches between local and grid level scheduling, problems with user identity mapping and exposed resources not being acted on. Industrial partners did not see any fast solutions from themselves. Abusive users should be punished (if the ETT time predictions were correct should be no need for abuse!). Should jobs be pulled or pushed to worker nodes. More enthusiasm for pull but not yet. The Condor ClassAds mechanism should be supported (a language to describe resources and policy). Sites would like, for example, to publish local resources per VO. Hepix encourages resource broker developers and LRS managers to find a forum to meet.

Proposal for standardizing the working environment for a LCG/EGEE job From the CCIN2P3 Grid computing team. Proposed several sets of environment variables and a naming convention to be implemented on all LCG/EGEE compute elements. –Posix base: HOME, PATH, PWD, SHELL, TMPDIR,USER –Globus: GLOBUS_LOCATION, GLOBUS_PATH, ….. –EDG: EDG_LOCATION, EDG_WL_JOBID, …. –LCG: LCG_LOCATION, LCG_CATALOG_TYPE, …. –Middleware independent: GRID_WORKDIR, GRID_SITENAME, GRID_HOSTNAME, …… This was welcomed by all as part of solving our problems (e.g. one variable contains the attributes of the certificate of the submitting user such as their address). They will distribute a document to site administrators and applications managers for comments aiming to start deployment at the end of June.

My Conclusions of batch workshop We have a good understanding of what Local Resource Schedulers can do and how they are being used. EGEE JRA1 should be encouraged to support the glue schema. Both sets of developers should communicate. Hep sites should volunteer to support glue schema interfaces to their LRS and provide input for the new version. The workshop in itself did not enhance communications with the grid developers as much as hoped but we must make sure this happens (e.g. as at recent EGEE/LCG Grid Operations workshop in Bologna).