Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

High Performance Computing Course Notes Grid Computing.
CREAM-CE status and evolution plans Paolo Andreetto, Sara Bertocco, Alvise Dorigo, Eric Frizziero, Alessio Gianelle, Massimo Sgaravatto, Lisa Zangrando.
7-1 INTRODUCTION: SoA Introduced SoA in Chapter 6 Service-oriented architecture (SoA) - perspective that focuses on the development, use, and reuse of.
8.
Workload Management Workpackage Massimo Sgaravatto INFN Padova.
McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved. Chapter 17 Client-Server Processing, Parallel Database Processing,
Workload Management Massimo Sgaravatto INFN Padova.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
SPRING 2011 CLOUD COMPUTING Cloud Computing San José State University Computer Architecture (CS 147) Professor Sin-Min Lee Presentation by Vladimir Serdyukov.
A tool to enable CMS Distributed Analysis
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
ATLAS DQ2 Deletion Service D.A. Oleynik, A.S. Petrosyan, V. Garonne, S. Campana (on behalf of the ATLAS Collaboration)
Commercial-in-Confidence 1 Managing eBusiness - Operational Challenges of an Online Business Model.
Cloud Computing.
ASG - Towards the Adaptive Semantic Services Enterprise Harald Meyer WWW Service Composition with Semantic Web Services
EMI INFSO-RI SA2 - Quality Assurance Alberto Aimar (CERN) SA2 Leader EMI First EC Review 22 June 2011, Brussels.
A DΙgital Library Infrastructure on Grid EΝabled Technology ETICS Usage in DILIGENT Pedro Andrade
CERN IT Department CH-1211 Genève 23 Switzerland t Internet Services Job Monitoring for the LHC experiments Irina Sidorova (CERN, JINR) on.
OOI CI LCA REVIEW August 2010 Ocean Observatories Initiative OOI Cyberinfrastructure Architecture Overview Michael Meisinger Life Cycle Architecture Review.
1 Introduction to Middleware. 2 Outline What is middleware? Purpose and origin Why use it? What Middleware does? Technical details Middleware services.
Cracow Grid Workshop October 2009 Dipl.-Ing. (M.Sc.) Marcus Hilbrich Center for Information Services and High Performance.
Middleware for Grid Computing and the relationship to Middleware at large ECE 1770 : Middleware Systems By: Sepehr (Sep) Seyedi Date: Thurs. January 23,
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
 Apache Airavata Architecture Overview Shameera Rathnayaka Graduate Assistant Science Gateways Group Indiana University 07/27/2015.
9 Systems Analysis and Design in a Changing World, Fourth Edition.
The GRelC Project: architecture, history and a use case in the environmental domain G. Aloisio - S. Fiore The Climate-G testbed is an interdisciplinary.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Cracow Grid Workshop ‘06 17 October 2006 Execution Management and SLA Enforcement in Akogrimo Antonios Litke Antonios Litke, Kleopatra Konstanteli, Vassiliki.
Stefano Belforte INFN Trieste 1 Middleware February 14, 2007 Resource Broker, gLite etc. CMS vs. middleware.
Distributed database system
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
Introduction to Grids By: Fetahi Z. Wuhib [CSD2004-Team19]
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
Distributed Data for Science Workflows Data Architecture Progress Report December 2008.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Use of the gLite-WMS in CMS for production and analysis Giuseppe Codispoti On behalf of the CMS Offline and Computing.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
CMS Usage of the Open Science Grid and the US Tier-2 Centers Ajit Mohapatra, University of Wisconsin, Madison (On Behalf of CMS Offline and Computing Projects)
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
Data Transfer Service Challenge Infrastructure Ian Bird GDB 12 th January 2005.
Andrea Manzi CERN On behalf of the DPM team HEPiX Fall 2014 Workshop DPM performance tuning hints for HTTP/WebDAV and Xrootd 1 16/10/2014.
EGI-InSPIRE RI EGI-InSPIRE EGI-InSPIRE RI Monitoring of the LHC Computing Activities Key Results from the Services.
ATLAS Database Access Library Local Area LCG3D Meeting Fermilab, Batavia, USA October 21, 2004 Alexandre Vaniachine (ANL)
David Foster LCG Project 12-March-02 Fabric Automation The Challenge of LHC Scale Fabrics LHC Computing Grid Workshop David Foster 12 th March 2002.
CERN IT Department CH-1211 Genève 23 Switzerland t CERN IT Monitoring and Data Analytics Pedro Andrade (IT-GT) Openlab Workshop on Data Analytics.
MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.
Daniele Spiga PerugiaCMS Italia 14 Feb ’07 Napoli1 CRAB status and next evolution Daniele Spiga University & INFN Perugia On behalf of CRAB Team.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
Enabling Grids for E-sciencE CMS/ARDA activity within the CMS distributed system Julia Andreeva, CERN On behalf of ARDA group CHEP06.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
CERN - IT Department CH-1211 Genève 23 Switzerland t Grid Reliability Pablo Saiz On behalf of the Dashboard team: J. Andreeva, C. Cirstoiu,
Grid Execution Management for Legacy Code Architecture Exposing legacy applications as Grid services: the GEMLCA approach Centre.
Tutorial on Science Gateways, Roma, Catania Science Gateway Framework Motivations, architecture, features Riccardo Rotondo.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
INFSO-RI Enabling Grids for E-sciencE File Transfer Software and Service SC3 Gavin McCance – JRA1 Data Management Cluster Service.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
REST API to develop application for mobile devices Mario Torrisi Dipartimento di Fisica e Astronomia – Università degli Studi.
9 Systems Analysis and Design in a Changing World, Fifth Edition.
A Seminar On. What is Cloud Computing? Distributed computing on internet Or delivery of computing service over the internet. Eg: Yahoo!, GMail, Hotmail-
TeraGrid Software Integration: Area Overview (detailed in 2007 Annual Report Section 3) Lee Liming, JP Navarro TeraGrid Annual Project Review April, 2008.
Claudio Grandi INFN Bologna Workshop congiunto CCR e INFNGrid 13 maggio 2009 Le strategie per l’analisi nell’esperimento CMS Claudio Grandi (INFN Bologna)
Workload Management Workpackage
CRAB Server CRAB (CMS Remote Analysis Builder)
Monitoring of the infrastructure from the VO perspective
Presentation transcript:

Client/Server Grid applications to manage complex workflows Filippo Spiga* on behalf of CRAB development team * INFN Milano Bicocca (IT)

Outline Science Gateways and Client/Server computing Client/server approach for CMS distributed analysis Usage statistics and trends Considerations, remarks and future works 2

Science Gateways & Client/Server Computing A Science Gateway is a community-developed set of tools, applications, and data collections that are integrated via a portal or a suite of applications. The Client/Server Computing is a distributed application architecture that partitions tasks or work loads between service providers (servers) and service requesters (clients). 3

Science Gateways & Client/Server Computing Science gateways and client/server applications … –share the main fundamental ideas –address the same purposes –are realized using different technologies –are deployed in different ways –interact with users through different interfaces  The choice between them will be made considering the target community (both users and developers) 4

Motivations & Advantages (for users) In the context of HEP computing, the client/server applications can… –automate (as much as possible) complex analysis workflows –allow more advanced job monitoring with centralized front ends –hide configuration complexity on heterogeneous grid environment –hide implementation details to the users –automatic resubmission in case of failure –allow (strong) centralized policies 5

Motivations & Advantages (for operators and developers) In the context of HEP computing, the client/server applications can… –extend/integrate the functionalities of the middleware –help to manage centrally updates –improve support to users by operators –avoid destructive congestions interacting with the grid –improve the scalability of the entire “computing model” 6

The experience with CRAB tool in CMS CRAB, a Python tool intended to create, submit and manage CMS analysis jobs in a Grid environment. A single tool addresses different applications/purposes: –CMSSW analysis over the Grid –private MC production –complex workflow automation Started as standalone application  it is evolved to a client/server application (in production since July 2007) 7

Standalone vs Client-Server (workflow point of view) 8

Server architecture Agent-based model Publish&Subscribe message service Most components are multi-threading Transparent support to multiple storage systems (SEAPI) and multiple middlewares (BossLite) 9

Components interaction example (submission) 10 If a component/thread crashes, the system continues to work Other services provided by the server will be not compromised At worst, smooth degradation of QoS

CRAB deployment on Grid infrastructure CRAB deployment strategy for CMS – enforce fault-tolerance – avoid single point congestion – gLite is the main middleware used – periodic challenges are performed to test and stress the infrastructure Different CRAB-AS in production now –located at CERN (CH), UCSD (US), BARI (IT) –each physic groups can deploy it’s own unofficial server Analysis Operators’ team (AnaOps) manages and monitors the services 11

Statistics: starting point Source of statistics: CMS Dashboard Focus on comparison between standalone and client/server approaches –trend analysis from Jan 2008 to Feb 2010 Statistic is not “perfect” –sometimes testing jobs submitted by CRAB developers are counted as analysis jobs… Only analysis jobs on T2 level were considered 12

CRAB Standalone vs Server (2008 – 2009) 13 Source: CMS Dashboard, 27/03/2010

CRAB Standalone vs Server (last 6 months, Sep’09 - Feb’10) 14  54% of submissions were made through the server Source: CMS Dashboard, 27/03/2010

CRAB Standalone vs Server (2009) 15 Source: CMS Dashboard, ~02/2010 Restart of LHC Number of distinct CRAB users: 150~200/day Max distinct users peak reached: ~250

Considerations & Remarks Architectural aspects of the software are the key factors to reach success –good performance achieved –good stability and reliability Inside CMS community, the adoption of client/server approach is increasing –good feedback by users –good quality of service reached The only (relevant but known) problems are related to data movement –stage-out remains a critical operation across the Grid, influenced by several factors 16

Improvements & Future works Consolidate the actual architecture –change the interaction with gLite middleware (CLI instead of API) –better strategies for stage-out operations –maintain the best interoperability for all middlewares supported –reduce the number of jobs aborted –improvements for intelligent resubmission New framework for all CMS computing: WMCore –one application targets the online (T0), the official production (T1) and the user analyses (T2) –communications based on RESTful web services –too young to go to production (expected for…) 17

THANK YOU FOR YOUR ATTENTION Questions? Special thanks to Daniele Spiga (CRAB), Fabio Farina (CRAB), Mattia Cinquilli (CRAB) and Julia Andreeva (CMS Dashboard) 18

The CRAB history (backup slide) 19 CRAB_1_X_X CRAB_2_X_X (10/2007) CRAB_2_7_1 (03/2010) CRABSERVER_0_0_1 (07/2007) CRABSERVER_1_0_X (05/2008) CRABSERVER_1_1_0 (11/2009) CRABSERVER_1_1_1 (03/2010) 2004/