José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid 19-20 January 2012, CIEMAT, Madrid.

Slides:



Advertisements
Similar presentations
CBPF J. Magnin LAFEX-CBPF. Outline What is the GRID ? Why GRID at CBPF ? What are our needs ? Status of GRID at CBPF.
Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
A tool to enable CMS Distributed Analysis
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
DISTRIBUTED COMPUTING
Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
Building a distributed software environment for CDF within the ESLEA framework V. Bartsch, M. Lancaster University College London.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
F.Fanzago – INFN Padova ; S.Lacaprara – LNL; D.Spiga – Universita’ Perugia M.Corvo - CERN; N.DeFilippis - Universita' Bari; A.Fanfani – Universita’ Bologna;
1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.
L ABORATÓRIO DE INSTRUMENTAÇÃO EM FÍSICA EXPERIMENTAL DE PARTÍCULAS Enabling Grids for E-sciencE Grid Computing: Running your Jobs around the World.
Grid Technologies  Slide text. What is Grid?  The World Wide Web provides seamless access to information that is stored in many millions of different.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
PanDA A New Paradigm for Computing in HEP Kaushik De Univ. of Texas at Arlington NRC KI, Moscow January 29, 2015.
F. Fassi, S. Cabrera, R. Vives, S. González de la Hoz, Á. Fernández, J. Sánchez, L. March, J. Salt, A. Lamas IFIC-CSIC-UV, Valencia, Spain Third EELA conference,
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
1. Maria Girone, CERN  Q WLCG Resource Utilization  Commissioning the HLT for data reprocessing and MC production  Preparing for Run II  Data.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Status Report of WLCG Tier-1 candidate for KISTI-GSDC Sang-Un Ahn, for the GSDC Tier-1 Team GSDC Tier-1 Team 12 th CERN-Korea.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.
CERN – IT Department CH-1211 Genève 23 Switzerland t Working with Large Data Sets Tim Smith CERN/IT Open Access and Research Data Session.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Site Architecture Resource Center Deployment Considerations MIMOS EGEE Tutorial.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
The CMS Top 5 Issues/Concerns wrt. WLCG services WLCG-MB April 3, 2007 Matthias Kasemann CERN/DESY.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
INFSO-RI Enabling Grids for E-sciencE CRAB: a tool for CMS distributed analysis in grid environment Federica Fanzago INFN PADOVA.
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
PIC the Spanish LHC Tier-1 ready for data taking EGEE09, Barcelona 21-Sep-2008 Gonzalo Merino,
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Latest Improvements in the PROOF system Bleeding Edge Physics with Bleeding Edge Computing Fons Rademakers, Gerri Ganis, Jan Iwaszkiewicz CERN.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Computing Model José M. Hernández CIEMAT, Madrid On behalf of the CMS Collaboration XV International Conference on Computing in High Energy and Nuclear.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Perspectives on LHC Computing José M. Hernández (CIEMAT, Madrid) On behalf of the Spanish LHC Computing community Jornadas CPAN 2013, Santiago de Compostela.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Grid Computing: Running your Jobs around the World
LHC DATA ANALYSIS INFN (LNL – PADOVA)
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
US CMS Testbed.
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid

José Hernández The CMS Experiment at the LHC 20 January 2012 Grid Computing in CMS 2 The Large Hadron Collider p-p collisions, 7 TeV, 40 MHz The Compact Muon Solenoid Precision measurements Search for new phenomena

José Hernández LHC: a challenge for computing  The Large Hadron Collider at CERN is the largest scientific instrument on the planet  Unprecedented data handling scale  40 MHz event rate (~1 GHz collision rate) → 100TB/s → online filtering to ~300 Hz (~300 MB/s) → ~3 PB/year (10 7 secs data taking/year)  Need large computing power to process data  Complex events  Many interesting signals << Hz  Thousands of scientists around the world access and analyze the data  Need computing infrastructure able to store, move around the globe, process, simulate and analyze data at the Petabyte scale [O(10) PB/year] 3

José Hernández The LHC Computing Grid 4 LCG: 300+ centers, 50+ countries, ~100k CPUs, ~ 100PB disk/tape, 10k users  The LHC Computing Grid provides the distributed computing infrastructure  Computing resources (CPU, storage, networking)  Computing services (data and job management, monitoring, etc)  Integrated to provide a single LHC computing service  Using Grid technologies  Transparent and reliable access to heterogeneous computing resources geographically distributed via internet  High capacity wide area networking

José Hernández The CMS Computing Model  Distributed computing model for data storage, processing and analysis  Grid technologies (Worldwide LHC Computing Grid, WLCG)  Tiered architecture of computing resources  ~20 Petabytes of data (real and simulated) every year  About 200k jobs (data processing, simulation production and analysis) per day

José Hernández WLCG network infrastructure 20 January 2012 Grid Computing in CMS 6  T0-T1 and T1-T1 interconnected via LHCOPN (10 Gpbs links)  T1-T2 and T2-T2 using general research networks  Dedicated network infrastructure (LHCONE) being deployed

José Hernández Grid services in WLCG  Middleware providers: gLite/EMI, OSG, ARC  Global services: data transfers and job management, authentication / authorization, information system  Compute (gateway, local batch system, WNs) and storage (gridftp servers, disk servers, mass storage system) elements at the sites  Experiment specific services 20 January 2012 Grid Computing in CMS 7

José Hernández CMS Data and Workload Management  Experiment-specific DMWM services on top of basic Grid services  Pilot-based WMS  Data bookkeeping, location and transfer systems  Data pre-located  Jobs go to data  Experiment software pre-installed at sites 20 January 2012 Grid Computing in CMS 8 Production System (WMAgent) Analysis System (CRAB) Data Bookkeeping & location system (DBS) Data Transfer System (PhEDEx) gLite WMS File Transfer System CE SE Local batch system Mass storage system CMS Services Grid Services Sites Operators Users Pilot-based WMS

José Hernández CMS Grid Operations - Jobs  Large scale data processing & analysis  ~50k used slots, 300k jobs/day  Plots correspond Aug 2011 – Jan January 2012 Grid Computing in CMS 9

José Hernández Spanish contribution to CMS Computing Resources 10  Spain contributes with ~ 5% of the CMS computing resources PIC Tier-1  ~1/2 average Tier-1  3000 cores, 4 PB disk, 6 PB tape IFCA Tier-2  ~ 2/3 average Tier-2 (~3% T2 resources)  1000 CPUs, 600 TB disk CIEMAT Tier-2  ~ 2/3 average Tier-2 (~3% T2 resources)  1000 cores, 600 TB disk

José Hernández Contribution from Spanish sites 20 January 2012 Grid Computing in CMS 11  ~5 % of total CPU delivered for CMS CPU delivered Feb 2011 – Jan 2012

José Hernández CMS Grid Operations - Data  Large scale data replication  1-2 GB/s throughput CMS-wide  ~1 PB/week data transfers  Full mesh 50+ sites T0 T1 T1 T2 T2 20 January 2012 Grid Computing in CMS 12 1 GB/s Production transfers debug transfers 1 GB/s

José Hernández Site monitoring/readiness 20 January 2012 Grid Computing in CMS 13

José Hernández Lessons learnt 20 January 2012 Grid Computing in CMS 14  Porting the production and analysis applications to the Grid was easy  Package job wrapper and user libraries into input sandbox  Experiment software pre-installed at the sites  Job wrapper sets up environment, runs the job, stages out output  When running at large scale in WLCG, additional services are needed  Job and data management services on top of Grid services  Data bookkeeping and location  Monitoring

José Hernández Lessons learnt 20 January 2012 Grid Computing in CMS 15  Monitoring is essential  Multi-layer complex system (experiment, Grid, site layers)  Monitor workflows, services, sites  Experiment services should be robust  Deal with (inherent) Grid unreliability  Be prepared for retries, cool-off  Pilot-based WMS  gLite BDII and WMS not reliable enough  Smaller overhead, verify node environment, global priorities, etc  Isolating users from the Grid; Grid operations team  Lots of manpower needed to operate the system  Central operations team (~20 FTE)  Contacts at sites (50+)

José Hernández Future developments 20 January 2012 Grid Computing in CMS 16  Dynamic data placement/deletions  Most of the pre-located data not really accessed much  Investigating automatic replication of hot data, deletion of cold data  Replicate data when accessed by jobs and cache locally  Remote data access  Jobs go to free slots and access data remotely  CMS has improved a lot read performance over WAN  At the moment only used as fail-over and overflow  Service to asynchronously copy user data  Remote stage out from WN is a bad idea  Multi-core processing  More efficient use of multi-core nodes, savings in RAM, many less jobs to handle

José Hernández Future developments 20 January 2012 Grid Computing in CMS 17  Virtualization of WNs/Cloud computing  Decouple node OS and application environment using VMs or chroot  Allow use of opportunistic resources  CERN VMFS for experiment software

José Hernández Summary 20 January 2012 Grid Computing in CMS 18  CMS has been very successful in using the LHC Computing Grid at large scale  Lot of work to make the system efficient, reliable and scalable  Some developments in the pipeline to make CMS distributed computing more dynamic and transparent