Presentation is loading. Please wait.

Presentation is loading. Please wait.

CMS Experience with Indigo DataCloud

Similar presentations


Presentation on theme: "CMS Experience with Indigo DataCloud"— Presentation transcript:

1 CMS Experience with Indigo DataCloud
Future challenges for HEP Computing CNAF Sonia Taneja; Stefano dal Pra: INFN-CNAF Marica Antonacci; Giacinto Donvito: INFN-BA Tommaso Boccali: INFN-PI Daniele Spiga: INFN-PG

2 Daniele Spiga - Future challenges in HEP Computing
Outline Brief introduction to INDIGO-DataCloud (Slides from Davide Salomoni and Giacinto Donvito) The CMS Experience using INDIGO solutions Motivations and objectives Architecture Status Next steps 24/10/16 Daniele Spiga - Future challenges in HEP Computing

3 The Project: INDIGO-DataCloud
24/10/16 Daniele Spiga - Future challenges in HEP Computing

4 INDIGO“simplified” architecture
24/10/16 Daniele Spiga - Future challenges in HEP Computing

5 Status of the INDIGO Project
24/10/16 Daniele Spiga - Future challenges in HEP Computing

6 INDIGO: The building blocks
24/10/16 Daniele Spiga - Future challenges in HEP Computing

7 The CMS experience using INDIGO…
…or how we use (some) INDIGO developed solutions to integrate a well consolidated (and complex) computing model into modern technologies Our challenge is to develop a solution for generating an on-demand, container based cluster to run CMS analysis workflow in order to allow: The effective use of opportunistic resources, like general purposes campus facilities. The dynamic extension of an already existing dedicated facility. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

8 CMS Workload Management System completely relies on HTCondor CMS runs three main HTCondor pools ITB for integration/testing purposes Tier0 for all T0 activities Global Pool (all the rest) Analysis & Production CMS has a model pilots driven Pilots either generated by GlideinWMS or via autostart mechanisms End users are provided with a friendly interface called CRAB to submit analysis jobs to the HTCondor queue CRAB pack the user local analysis environment (configs, libs etc) CRAB submit actual job to HTCondor 24/10/16 Daniele Spiga - Future challenges in HEP Computing

9 HTCondor: a central role
Putting everything together our objective is to demonstrate the ability to dynamically add opportunistic resources to an already existing HTCondor Pool. Moreover we aim to provide a friendly and secure solution CMS represents a perfect case study to test and to evaluate our solution. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

10 Daniele Spiga - Future challenges in HEP Computing
Expected impacts By simplifying and automating the process of creating, managing and accessing a pool of computing resources the project aims to improve: Sites management: A simple solution for dynamic/elastic T2 extensions on “opportunistic”/stable resources A friendly procedure to dynamically instantiate a spot ‘ Tier3 like resource center’ Users experience: Generation of a ephemeral T3 on demand seen by the Experiment computing infrastructure as personal WLCG-type facility, in order to serve a group of collaborators. The system must allow the use of standard/regular CMS Tools such as CRAB. Experiment-Collaboration resources: A comprehensive approach to opportunistic computing. A solution to access and orchestrate multiple e.g. campus centers, harvesting all the free CPU cycles without major deployment effort. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

11 Daniele Spiga - Future challenges in HEP Computing
Four Pillars Cluster Management: Mesos clusters as a solution in order to execute docker for all the services required by a regular CMS site (Worker Node, HTCondor Schedd and squids). Marathon guarantees us the dynamic scaling up and down of resources, a key point. AuthN/Z & Credential Management: INDIGO Identity Access Management (IAM) service is responsible for AuthN/Z to the cluster generation. Token Translation Service (TTS) enables the conversion of IAM tokens in to a X.509 certificates NOTE: This allow Mesos slaves (running HTCondor_startd daemon) to join CMS central queue (HTCondor_schedd) as a regular Grid WN Data Management: Dynafed & FTS is the approach currently followed by the project. We will investigate Oneclient (from Onedata) as a tool allowing to mount remote Posix file-system. Automation: TOSCA templates, meant to be managed by INDIGO PaaS Orchestrator, allow the automation of the overall setup. The aim is to produce a single YAML file describing the setup of all required services and deps. CLUES is able to scale the Mesos cluster as needed by the load of the users jobs. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

12 Daniele Spiga - Future challenges in HEP Computing
Architecture USER Schedd (CMS central or private) CRAB CFG User analysis job description pointing to SITENAME VM#1 Squid1 VM#2 WN1 VM#3 WN2 VM#4 WN3 Cloud#1 Mesos cluster SITENAME #/type of services SQUIDs Schedd if needed WNs (range desired) Onedata / Dynafed attached Storage TFC rules Fallback strategy Temp storage to be used Cloud#2 Data Management Data as defined in TFC (Onedata, Dynafed, Xrootd Fed) PaaS Orchestrator TTS Provides Submits Configures Instantiates Translates Reads Joins IAM Retrieves 24/10/16 Daniele Spiga - Future challenges in HEP Computing

13 The prototype... It is working
Marathon Apps Monitoring The Mesos Cluster generation has been fully automated (not yet TOSCA) We added CMS specifications to the HEAT (+Ansible) INDIGO template for a Mesos cluster: Squid proxy (docker), CVMFS setup, WN (docker), proxy manager service (docker) Currently from user perspectives generating such a cluster means: Where env_heat allow to specify few parameters such as the size of the cluster. 24/10/16 Daniele Spiga - Future challenges in HEP Computing

14 A real test with CRAB jobs
Excerpt crab config Regular CRAB jobs successfully run on the generated Mesos cluster condor_q CMS Dashboard: job monitoring 24/10/16 Daniele Spiga - Future challenges in HEP Computing

15 Daniele Spiga - Future challenges in HEP Computing
… Inside Mesos Cluster Worker nodes (docker) are running And executing CMS Analysis Payload 24/10/16 Daniele Spiga - Future challenges in HEP Computing

16 Daniele Spiga - Future challenges in HEP Computing
About AuthN/Z The Mesos cluster generation requires a valid IAM access token (Openid Connect) All the rest is handled transparently to the user thanks to: Delegation of credential among services Credential Renewal (through token exchange feature) Token Translation Service Creates credential for services not integrating OIDC We generate our cluster passing IAM token as incoming credential, and our WN (startd) authN with Schedd using the following proxy: 24/10/16 Daniele Spiga - Future challenges in HEP Computing

17 Daniele Spiga - Future challenges in HEP Computing
Data Management So far we rely on the storage solution already adopted by CMS Computing model: Job input data is accessed through xrootd data federation Job produced output go through “gfal-cp” As next step: The plan for the second phase of this project foresees the exploitation of INDIGO data management solutions Onedata, Dynafed 24/10/16 Daniele Spiga - Future challenges in HEP Computing

18 Daniele Spiga - Future challenges in HEP Computing
Upcoming challenges Moving from HEAT to TOSCA usage in order to exploit INDIGO PaaS Orchestration Dynamic auto scaling of cluster E.g. integrating CLUES To include a schedd among the services running on the Mesos cluster Allows to test additional setups (Flocking configuration) Represents a step towards a more general solution Multy IaaS interoperability Test the proposed solution making a effective use of opportunistic and possibly public providers 24/10/16 Daniele Spiga - Future challenges in HEP Computing

19 Daniele Spiga - Future challenges in HEP Computing
Conclusion We developed a prototype for generating a “T3 ephemeral site as a service” We enabled the complete automation of the cluster generation We integrated the INDIGO credential harmonization We successfully executing CMS analysis jobs Although the presented prototype is focused on CMS workflows, the adopted approach can be easily generalized e.g. to whoever can integrate HTCondor 24/10/16 Daniele Spiga - Future challenges in HEP Computing


Download ppt "CMS Experience with Indigo DataCloud"

Similar presentations


Ads by Google