Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland www.cern.ch/i t DBES Successful Common Projects: Structures and Processes WLCG Management.

Slides:



Advertisements
Similar presentations
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Feasibility Study on a Common Analysis Framework for ATLAS & CMS.
Advertisements

1 Bridging Clouds with CernVM: ATLAS/PanDA example Wenjing Wu
CERN IT Department CH-1211 Genève 23 Switzerland t Integrating Lemon Monitoring and Alarming System with the new CERN Agile Infrastructure.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
SCD FIFE Workshop - GlideinWMS Overview GlideinWMS Overview FIFE Workshop (June 04, 2013) - Parag Mhashilkar Why GlideinWMS? GlideinWMS Architecture Summary.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CERN IT Department CH-1211 Geneva 23 Switzerland t The Experiment Dashboard ISGC th April 2008 Pablo Saiz, Julia Andreeva, Benjamin.
CERN IT Department CH-1211 Genève 23 Switzerland t EIS section review of recent activities Harry Renshall Andrea Sciabà IT-GS group meeting.
YAN, Tian On behalf of distributed computing group Institute of High Energy Physics (IHEP), CAS, China CHEP-2015, Apr th, OIST, Okinawa.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES PhEDEx Monitoring Nicolò Magini CERN IT-ES-VOS For the PhEDEx.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.
Cloud Status Laurence Field IT/SDC 09/09/2014. Cloud Date Title 2 SaaS PaaS IaaS VMs on demand.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
CERN Physics Database Services and Plans Maria Girone, CERN-IT
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
1 The Adoption of Cloud Technology within the LHC Experiments Laurence Field IT/SDC 17/10/2014.
EGEE-III INFSO-RI Enabling Grids for E-sciencE Overview of STEP09 monitoring issues Julia Andreeva, IT/GS STEP09 Postmortem.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
Virtualised Worker Nodes Where are we? What next? Tony Cass GDB /12/12.
Julia Andreeva, CERN IT-ES GDB Every experiment does evaluation of the site status and experiment activities at the site As a rule the state.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
A PanDA Backend for the Ganga Analysis Interface J. Elmsheuser 1, D. Liko 2, T. Maeno 3, P. Nilsson 4, D.C. Vanderster 5, T. Wenaus 3, R. Walker 1 1: Ludwig-Maximilians-Universität.
INFSO-RI Enabling Grids for E-sciencE ARDA Experiment Dashboard Ricardo Rocha (ARDA – CERN) on behalf of the Dashboard Team.
Commissioning the CERN IT Agile Infrastructure with experiment workloads Ramón Medrano Llamas IT-SDC-OL
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
LCG Support for Pilot Jobs John Gordon, STFC GDB December 2 nd 2009.
Workload management, virtualisation, clouds & multicore Andrew Lahiff.
Tool Integration with Data and Computation Grid “Grid Wizard 2”
Julia Andreeva on behalf of the MND section MND review.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Hammercloud and Nagios Dan Van Der Ster Nicolò Magini.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Data Placement Intro Dirk Duellmann WLCG TEG Workshop Amsterdam 24. Jan 2012.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
Global ADC Job Monitoring Laura Sargsyan (YerPhI).
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Data Management Highlights in TSA3.3 Services for HEP Fernando Barreiro Megino,
Parag Mhashilkar Computing Division, Fermi National Accelerator Laboratory.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
New solutions for large scale functional tests in the WLCG infrastructure with SAM/Nagios: The experiments experience ES IT Department CERN J. Andreeva.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Andrea Sciabà Ideal information system - CMS Andrea Sciabà IS.
Enabling Grids for E-sciencE Experience Supporting the Integration of LHC Experiments Computing Systems with the LCG Middleware Simone.
CMS: T1 Disk/Tape separation Nicolò Magini, CERN IT/SDC Oliver Gutsche, FNAL November 11 th 2013.
CERN IT Department CH-1211 Geneva 23 Switzerland t ES 1 how to profit of the ATLAS HLT farm during the LS1 & after Sergio Ballestrero.
The GridPP DIRAC project DIRAC for non-LHC communities.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI CERN and HelixNebula, the Science Cloud Fernando Barreiro Megino (CERN IT)
Platform & Engineering Services CERN IT Department CH-1211 Geneva 23 Switzerland t PES Improving resilience of T0 grid services Manuel Guijarro.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES The Common Solutions Strategy of the Experiment Support group.
CMS Experience with the Common Analysis Framework I. Fisk & M. Girone Experience in CMS with the Common Analysis Framework Ian Fisk & Maria Girone 1.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
DIRAC for Grid and Cloud Dr. Víctor Méndez Muñoz (for DIRAC Project) LHCb Tier 1 Liaison at PIC EGI User Community Board, October 31st, 2013.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Daniele Bonacorsi Andrea Sciabà
WLCG IPv6 deployment strategy
Review of the WLCG experiments compute plans
WLCG Collaboration Workshop;
The LHCb Computing Data Challenge DC06
Presentation transcript:

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Successful Common Projects: Structures and Processes WLCG Management Board 20 th November 2012 Maria Girone, CERN IT

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Historical Perspective The original LCG-EIS model was primarily experiment-specific with the team having a key responsibility within one experiment –Examples of cross-experiment work existed but they were not the main thrust From the beginning of EGI-InSPIRE (SA3.3 - Services for HEP), a major transition has taken place: focus on common solutions, shared expertise –A strong and enthusiastic team This has led to a number of notable successes, covered later Maria Girone, IT-ES2

CERN IT Department CH-1211 Geneva 23 Switzerland t ES The process Identify areas of interest between grid services and the experiment communities which would benefit by –Common tools and services –Common procedures Facilitate their integration in the experiments workflows Save resources by having a central team with knowledge of both IT and experiments Key element: regular discussions with computing management; agreement on priorities; review achievements with plans Maria Girone, IT-ES3

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Structure of a Common Solution Interface layer between common infrastructure elements and the truly experiment specific components –Higher layer: experiment environments –Box in between: common solutions A lot of effort is spent in these layers Significant potential savings of effort in commonality –not necessarily implementation, but approach & architecture –Lower layer: common grid interfaces and site service interfaces Maria Girone, CERN4 Higher Level Services that translate between Experiment Specific Elements Common Infrastructure Components and Interfaces IT/ES

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Data and Workload Management Maria Girone, IT-ES5

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Site Commissioning and Availability Maria Girone, IT-ES6

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Summary Integration of services using common pools of expertise allows optimization of resources on both sides Infrastructure and grid services (FTS, CE, SE, VMs, Clouds, etc) Workflow and higher level services (PANDA, Dynamic Data Placement, Site Commissioning and Availability, etc) Common solutions result in fewer services, better integration testing, and more stable and consistent operations LHC schedule presents a good opportunity for technology changes during LS1 Key process: regular discussions with computing management; agreement on priorities; review achievements with plans Key benefit: successfully deployed common solutions have immediately saved integration effort, and will save in operations effort Maria Girone, IT-ES7

Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES Examples of common projects

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Data Popularity & Cleaning Experiments want to know which datasets are used, how much, and by whom –First Idea and implementation by ATLAS, followed by CMS and LHCb Data popularity uses the fact that all experiments open files and access storage The monitoring information can be accessed in a common way using generic and common plug-ins The experiments have systems that identify how those files are mapped onto logical objects like datasets, reprocessing and simulation campaigns Maria Girone, CERN9 Files accessed, users and CPU used Experiment Booking Systems Mapping Files to Datasets File Opens and Reads

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Popularity Service Used by the experiments to assess the importance of computing processing work – to decide when the number of replicas of a sample needs to be adjusted - either up or down –to suggest obsolete data that can be safely deleted without affecting analysis. Maria Girone, CERN10

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Site Cleaning Service The Site Cleaning Agent is used to suggest obsolete or unused data that can be safely deleted without affecting analysis. The information about space usage is taken from the experiment dedicated data management and transfer system High savings in terms of storage resources: 2PB (20% of total managed space) Maria Girone, CERN11

CERN IT Department CH-1211 Geneva 23 Switzerland t ES EOS Data Popularity Maria Girone, IT-ES12 Allows the experiments to verify that EOS and CPU resources at CERN are used as planned First deployed use-case: monitor the file usage of Xrootd- based EOS CERN for ATLAS and CMS To be extended to the rest of the ATLAS and CMS storage federation assess data popularity also for batch/interactive job submissions help in managing the user space on a site: Weekly amount of read data for the ATLAS most popular Projects/Data Type accessed from EOS from Feb. to Aug.

CERN IT Department CH-1211 Geneva 23 Switzerland t ES HammerCloud HammerCloud is a common testing framework for ATLAS (PanDA), then exported to CMS (CRAB) and LHCb (Dirac) Common layer (built on Ganga) for functional testing of CEs and SEs from a user perspective Continuous testing and monitoring of site status and readiness. Automatic Site exclusion based on defined experiment policies Same development, same interface, same infrastructure  less workforce to maintain it, Maria Girone, CERN13 Testing and Monitoring Framework Distributed analysis Frameworks Computing & Storage Elements

CERN IT Department CH-1211 Geneva 23 Switzerland t ES HammerCloud Allows sites to make reconfigurations and then test the site with realistic workflows to evaluate the effectiveness of the change Sufficient granularity in reporting that it can identify which of the site services has gone bad Adapting it as cloud infrastructure testing and validation tool –CERN IT Agile Infrastructure testbed, HLT farms

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Common Analysis Framework As of spring IT-ES proposed to look at commonality in the analysis submission systems Using PanDA as the common workflow engine Investigating elements of GlideinWMS for the pilot 90% of the code CMS used to submit to the experiment specific workflow engine could be reused submitting to PanDA Feasibility study presented at CHEP Program of work for a Proof-of-Concept (PoC) Having people familiar in both systems working together was critical PoC prototype (due by end 2012) is ahead of schedule Dedicated Workshop in December Maria Girone, CERN15

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Dedicated Resources to PoC IT-ES has invested resources with expertise on both experiments workflows 2 FTE (CMS) + 1 FTE (ATLAS) ATLAS: very constructive interaction with PanDA developers (pilot, factory, server and monitoring) for the work on system modularity CMS: user data handling and GlideinWMS expertise Maria Girone, CERN16

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Analysis Framework Diagram Maria Girone, CERN17 (Optional) Client Service VO-specific client PanDA monitor and Dashboard Historical views Data Mgmt Services PanDA pilot Computing Element … Client sideServer sideGrid resources PanDA components VO specific, external components glideIns PanDA Server GlideIn WMS GlideInWMS components PanDA Pilot Factories Job trans Data Adaptor glexec PanDA pilot Job trans

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Status PanDA services have been integrated in the CMS specific analysis computing framework Jobs submitted through CMS specific interface (CRAB3) on a dedicated testbed (4 sites) User data transfers managed by CMS specific tools (Asynchronous Stage Out) GlideInWMS for CMS workflow still to be included Will profit of ATLAS experience: “Feasibility of integration of GlideinWMS and PanDA” Also now working on direct gLExec-PanDA integration Maria Girone, CERN18

CERN IT Department CH-1211 Geneva 23 Switzerland t ES First Results Prototype phase completed Functionality validation, following CMS requirements, in a multi-user environment Full integration in the CMS workflow during LS1 Maria Girone, CERN19

CERN IT Department CH-1211 Geneva 23 Switzerland t ES Agile Infrastructure Testing Maria Girone, CERN20 Head node Workers CernVM FS Experiment workload management framework (ATLAS PanDA, CMS glidein) CernVM ganglia httpd condor cvmfs Condor head CERN AI Openstack CernVM ganglia condor cvmfs CERN EOS Storage Element jobs Software Input and output data 1.Boot up a batch cluster in the CERN Openstack infrastructure 2.Integrate it with the experiments’ workload management framerworks 3.Run experiment workload on the cluster 4.Share procedures and image configuration between ATLAS and CMS

CERN IT Department CH-1211 Geneva 23 Switzerland t ES First Results Maria Girone, CERN21 Nov hours Nov Finished: 8630 Failed: 57 rge Nov Finished: 1118 Failed: 89 Currently ramping up size of clusters Running HammerCloud and test jobs Next steps: Operate standard production queue on the cloud Analyze HammerCloud metrics, compare with production queues and provide feedback