Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.

Slides:



Advertisements
Similar presentations
1 ALICE Grid Status David Evans The University of Birmingham GridPP 16 th Collaboration Meeting QMUL June 2006.
Advertisements

Introduction to CMS computing CMS for summer students 7/7/09 Oliver Gutsche, Fermilab.
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
WLCG/8 July 2010/MCSawley WAN area transfers and networking: a predictive model for CMS WLCG Workshop, July 7-9, 2010 Marie-Christine Sawley, ETH Zurich.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Claudio Grandi INFN Bologna CMS Operations Update Ian Fisk, Claudio Grandi 1.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
ATLAS in LHCC report from ATLAS –ATLAS Distributed Computing has been working at large scale Thanks to great efforts from shifters.
November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
Progress in Computing Ian Bird ICHEP th July 2010, Paris
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
ATLAS WAN Requirements at BNL Slides Extracted From Presentation Given By Bruce G. Gibbard 13 December 2004.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
The CMS Computing System: getting ready for Data Analysis Matthias Kasemann CERN/DESY.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
LHC Computing, CERN, & Federated Identities
1 Andrea Sciabà CERN The commissioning of CMS computing centres in the WLCG Grid ACAT November 2008 Erice, Italy Andrea Sciabà S. Belforte, A.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
Maria Girone, CERN CMS Experiment Status, Run II Plans, & Federated Requirements Maria Girone, CERN XrootD Workshop, January 27, 2015.
Ian Bird Overview Board; CERN, 8 th March 2013 March 6, 2013
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The ATLAS Computing & Analysis Model Roger Jones Lancaster University ATLAS UK 06 IPPP, 20/9/2006.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
ATLAS Computing: Experience from first data processing and analysis Workshop TYL’10.
CERN IT Department CH-1211 Genève 23 Switzerland t EGEE09 Barcelona ATLAS Distributed Data Management Fernando H. Barreiro Megino on behalf.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
Availability of ALICE Grid resources in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
ALICE Computing Model in Run3
ALICE Computing Upgrade Predrag Buncic
New strategies of the LHC experiments to meet
LHC Data Analysis using a worldwide computing grid
The LHCb Computing Data Challenge DC06
Presentation transcript:

Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk

Physics at the LHC10/06/10 2 A Little History ‣ LHC Computing Grid was approved by CERN Council Sept ‣ First Grid Deployment Board was Oct ‣ LCG was built on services developed in Europe and the US. ‣ LCG has collaborated with a number of Grid Projects ‣ It evolved into the Worldwide LCG (WLCG) ‣ EGEE, NorduGrid, and Open Science Grid ‣ Services Support the 4 LHC Experiments

Physics at the LHC10/06/10 3 Today’s WLCG ‣ More than 170 computing facilities in 34 countries ‣ More than 100k Processing Cores ‣ More than 50PB of disk

Physics at the LHC10/06/10 4 ‣ Running increasingly high workloads: ‣ Jobs in excess of 900k / day; Anticipate millions / day soon ‣ CPU equiv. ~100k cores ‣ Workloads are ‣ Real data processing ! ‣ Simulations ‣ Analysis – more and more (new) users: several hundreds now ‣ Data transfers at unprecedented rates Today WLCG is 900k jobs/day ~100 ~100k CPU-days/day

Physics at the LHC10/06/10 5 Services ‣ Basic set of grid services at sites to provide access to processing and storage ‣ Tools to securely authenticate and manage the membership of the experiment ‣ Not all experiments use all services or use all services in the same way CE SE Information System FTS BDII WMS Lower Level Services Providing Consistent Interfaces to Facilities Higher Level Services VOMS Experiment Services Site Connection to batch (Globus Based, more recently CREAM based) Connection to storage (SRM or xrootd)

Physics at the LHC10/06/10 6 Architectures ‣ To greater and lesser extents LHC Computing model are based on the MONARC model ‣ Developed more than a decade ago ‣ Foresaw Tiered Computing Facilities to meet the needs of the LHC Experiments Tier-0Tier-0Tier-0 Tier-1Tier-1Tier-1Tier-1Tier-1Tier-1Tier-1Tier-1 Tier-2Tier-2Tier-2Tier-2Tier-2Tier-2Tier-2 ATLAS ALICE CMSLHCb Cloud Mesh

Physics at the LHC10/06/10 7 Data Taking ‣ An extremely interesting region ‣ Exponential Increase means a good weekend can double or triple the dataset ‣ A significant failure or outage for a fill would be a big fraction of the total data ‣ Original planning for Computing in the first six months had higher data volumes (tens of inverse picobarn) ‣ Total volumes of data are not stressing the resources ‣ Slower ramp has allowed predicted activities to be performed more frequently 7-8nb-1 May 26, 2010 ExperimentSize ATLAS~120TB CMS~35TB LHCb20TB

Physics at the LHC10/06/10 8 Activities ‣ Limited volume of data has allowed a higher frequency of workflows ‣ Important during the commissioning phase ‣ All experiments report workflows executed are the type and location predicted in the computing model Tier-0 Tier-1Tier-1Tier-1 Tier-2Tier-2Tier-2 CMS Tier-2 Tier-2 CAF Prompt Reconstruction Storage Commissioning Re-Reconstruction/ Simulation Archiving Data Serving Simulation and User Analysis

Physics at the LHC10/06/10 9 Reliability ‣ The vast majority of the Computing Resources for the LHC are away from CERN. ‣ WLCG Services are carefully monitored to ensure access to resources ‣ Much of the effort in the last few years has been in improving operations Distributed Computing is when a machine you’ve never heard of before, half a world away, goes down and stops you from working “We’ve made the world’s largest Tamagotchi” - Lassi Tuura

Physics at the LHC10/06/10 10 Reliability ‣ Monitoring of basic WLCG Services ‣ Clear improvement in preparation for data taking

Physics at the LHC10/06/10 11 Readiness as Seen by the Experiments ‣ Site readiness as seen by the experiments ‣ LH week before data taking; RH 1st week of data ‣ Experiment tests include specific workflows

Physics at the LHC10/06/10 12 Data Transfer and ReProcessing 12 Tier- 0 Arc hiv e Tier- 1 Arc hiv e Tier- 1 OPN links now fully redundant – last one – RAL – now in production Fibre cut during STEP’09: Redundancy meant no interruption Fibre cut during STEP’09: Redundancy meant no interruption

Physics at the LHC10/06/10 13 Transfers ‣ Transfers from CERN to the Tier-1s are large, and driven by collision and cosmic running Castor traffic last month: > 4 GB/s input > 13 GB/s served Castor traffic last month: > 4 GB/s input > 13 GB/s served

Physics at the LHC10/06/10 14 Transfer Peaks ‣ Peak transfers are meeting the computing model expectations LHCb CMS ATLAS

Physics at the LHC10/06/10 15 Data Reprocessing Profile ‣ Once data arrives at Tier-1 Centers is reprocessed ‣ New calibrations, software and data formats Average concurrent jobs ALICE CMS ATLAS

Physics at the LHC10/06/10 16 Data Processing ‣ LHC Experiments can currently reprocess the entire collected data in less than a week ‣ These reprocessing passes will grow to several months as the data volumes increase ‣ Grid interfaces to the batch systems are scaling well ‣ Storage systems are ingesting and serving data CPU Efficiency is reaching expectations for organized processing

Physics at the LHC10/06/10 17 Transfers to Tier-2s and Analysis ‣ Once data is onto the WLCG it is made accessible to analysis applications ‣ Largest fraction of analysis computing is at Tier-2s Tier-1 Arch ive Tier-2Tier-2Tier-2Tier-2 Example of Data Collection vs. Computing Exercise ATLAS

All CMS Meeting05/05/10 18 Data Moving to Tier-2s ‣ Tier-1 to Tier-2 average is coming up ‣ 49 Tier-2s sites have received data since the start of 7TeV Collisions ‣ After significant effort of the commissioning team we’re now having serious Tier-2 to Tier-2 transfers ‣ Good for group skims and replicating data 2GB/s 0.5GB/s Tier-2 to Tier-2 CMS

Physics at the LHC10/06/10 19 ALICE ANALYSIS train ALICE has an interesting analysis train model. User code is picked up and executed with other analyses

Physics at the LHC10/06/10 20 ALICE End user analysis ~ 200 users ~ 500 jobs on average over 3 months

Physics at the LHC10/06/10 ATLAS Analysis data access ‣ Data were analyzed on the Grid already from the first days of data- taking ‣ Data are analyzed at many Tier-2, and some Tier-1, sites ‣ See data access counts in top pie chart ‣ Many hundreds of users accessed data on the Grid 8

All CMS Meeting05/05/10 22 CMS Analysis Activity ‣ Number of people participating in analysis is increasing Oct X Holiday 465 Active Users ‣ Largest access to collision data and the corresponding simulation

All CMS Meeting05/05/10 23 CMS Analysis Ratio CMS

Physics at the LHC10/06/10 24 LHCb Analysis Usage ‣ Good balance of sites ‣ Peak of 1200 CPU days delivered in a day

Physics at the LHC10/06/10 25 Simulated Event Production ‣ Simulated Event Production is one of the earliest grid applications and very successful ‣ Pile-up and realistic simulation making this a more interesting problem Tier-2Tier-2Tier-2Tier-2 Tier-1 Tier-1

Physics at the LHC10/06/10 26 ALICE MC production MC production in all T1/T2 sites 30 TB with replica

Physics at the LHC10/06/10 ‣ Simulation production continues in the background all the time ‣ Fluctuations caused by a range of causes, including release cycles, sites downtimes etc. ‣ Parallel effort underway for MC reconstruction and reprocessing ‣ Including reprocessing of MC GeV and 2.36 TeV samples with AtlasTier reconstruction, same release as for data reprocessing, done in December ‣ More simulated ESDs produced since Dec'09 to match 2009 real data analysis 7 ATLAS 2009 simulation production Production jobs running in each cloud 16 Feb 2009 – 15 Feb k TB Simulated data produced 1 Jul 2009 – 15 Feb 2010

Physics at the LHC10/06/10 28 CMS MC Production by Month ‣ Total number of events produced in 2009 is large ‣ Big variations by month ‣ Generally related to lack of work, not lack of capacity Simulation (8k slots) 57 Tier-2s + Tier-3s Participating

Physics at the LHC10/06/10 29 LHCb139 sites hit, 4.2 million jobs ‣ Start in June: start of MC09

Physics at the LHC10/06/10 30 Outlook ‣ The Grid Infrastructure is working for physics ‣ Data is reprocessed ‣ Simulated Events are Produced ‣ Data is delivered to analysis users ‣ Integrated volume of data and livetime of the accelerator are still lower than the final planning ‣ Not all resources are equally utilized ‣ Activity level is high ‣ Still lots to learn with increasing datasets