Petabyte-scale computing for LHC Ian Bird, CERN WLCG Project Leader WLCG Project Leader ISEF Students 18 th June 2012 Accelerating Science and Innovation.

Slides:



Advertisements
Similar presentations
Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
Advertisements

Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Project Status Report Ian Bird Computing Resource Review Board 30 th October 2012 CERN-RRB
13 October 2014 Eric Grancher, head of database services, CERN IT Manuel Martin Marquez, data scientist, CERN openlab.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Massive Computing at CERN and lessons learnt
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHC: An example of a Global Scientific Community Sergio Bertolucci CERN 5 th EGEE User Forum Uppsala, 14 th April 2010.
Resources and Financial Plan Sue Foffano WLCG Resource Manager C-RRB Meeting, 12 th October 2010.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Emmanuel Tsesmelis 2 nd CERN School Thailand 2012 Suranaree University of Technology.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
A short introduction to GRID Gabriel Amorós IFIC.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
Rackspace Analyst Event Tim Bell
IST E-infrastructure shared between Europe and Latin America High Energy Physics Applications in EELA Raquel Pezoa Universidad.
Notur: - Grant f.o.m is 16.5 Mkr (was 21.7 Mkr) - No guarantees that funding will increase in Same level of operations maintained.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
10/24/2015OSG at CANS1 Open Science Grid Ruth Pordes Fermilab
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
Progress in Computing Ian Bird ICHEP th July 2010, Paris
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
WelcomeWelcome CSEM – CERN Day 23 rd May 2013 CSEM – CERN Day 23 rd May 2013 to Accelerating Science and Innovation to Accelerating Science and Innovation.
Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.
Dr. Andreas Wagner Deputy Group Leader - Operating Systems and Infrastructure Services CERN IT Department The IT Department & The LHC Computing Grid –
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
LHC Computing, CERN, & Federated Identities
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Frédéric Hemmer IT Department 26 th January 2010 Visit of Michael Dell 1 Frédéric.
WLCG Worldwide LHC Computing Grid Markus Schulz CERN-IT-GT August 2010 Openlab Summer Students.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Ian Bird WLCG Networking workshop CERN, 10 th February February 2014
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
GDB, 07/06/06 CMS Centre Roles à CMS data hierarchy: n RAW (1.5/2MB) -> RECO (0.2/0.4MB) -> AOD (50kB)-> TAG à Tier-0 role: n First-pass.
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
Germany and CERN / June 2009Germany and CERN | May Welcome - Willkommen CERN: to CERN: Accelerating Science and Innovation Professor Wolfgang A.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Evolution of WLCG infrastructure Ian Bird, CERN Overview Board CERN, 30 th September 2011 Accelerating Science and Innovation Accelerating Science and.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
WLCG: The 1 st year with data & looking to the future WLCG: Ian Bird, CERN WLCG Project Leader WLCG Project LeaderLCG-France; Strasbourg; 30 th May 2011.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Computing Fabrics & Networking Technologies Summary Talk Tony Cass usual disclaimers apply! October 2 nd 2010.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
Ian Bird, CERN WLCG Project Leader Amsterdam, 24 th January 2012.
Evolution of storage and data management
The 5 minutes tour of CERN The 5 minutes race of CERN
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Grid site as a tool for data processing and data analysis
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Dagmar Adamova, NPI AS CR Prague/Rez
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
WLCG Collaboration Workshop;
WLCG Collaboration Workshop;
LHC Data Analysis using a worldwide computing grid
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

Petabyte-scale computing for LHC Ian Bird, CERN WLCG Project Leader WLCG Project Leader ISEF Students 18 th June 2012 Accelerating Science and Innovation Accelerating Science and Innovation

Enter a New Era in Fundamental Science Start-up of the Large Hadron Collider (LHC), one of the largest and truly global scientific projects ever, is the most exciting turning point in particle physics. Exploration of a new energy frontier LHC ring: 27 km circumference CMS ALICE LHCb ATLAS data

DateCollaboration sizesData volume, archive technology Late 1950’s2-3Kilobits, notebooks 1960’s10-15kB, punchcards 1970’s~35MB, tape 1980’s~100GB, tape, disk 1990’s TB, tape, disk 2010’s~3000PB, tape, disk CERN / January Some history of scale… For comparison: 1990’s: Total LEP data set ~few TB Would fit on 1 tape today Today: 1 year of LHC data ~25 PB For comparison: 1990’s: Total LEP data set ~few TB Would fit on 1 tape today Today: 1 year of LHC data ~25 PB Where does all this data come from? CERN has about 60,000 physical disks to provide about 20 PB of reliable storage

CERN / May 2011

150 million sensors deliver data … … 40 million times per second

Raw data: – Was a detector element hit? – How much energy? – What time? Reconstructed data: – Momentum of tracks (4-vectors) – Origin – Energy in clusters (jets) – Particle type – Calibration information – … CERN / January What is this data?

HEP data are organized as Events (particle collisions) Simulation, Reconstruction and Analysis programs process “one Event at a time” – Events are fairly independent  Trivial parallel processing Event processing programs are composed of a number of Algorithms selecting and transforming “raw” Event data into “processed” (reconstructed) Event data and statistics Ian Bird, CERN 7 Data and Algorithms 26 June 2009 RAW Detector digitisation ~2 MB/event ESD/RECO Pseudo-physical information: Clusters, track candidates ~100 kB/event AOD ~10 kB/event TAG ~1 kB/event Relevant information for fast event selection Triggered events recorded by DAQ Reconstructed information Analysis information Classification information Physical information: Transverse momentum, Association of particles, jets, id of particles

simulation reconstruction analysis interactive physics analysis batch physics analysis batch physics analysis detector event summary data raw data event reprocessing event reprocessing event simulation event simulation analysis objects (extracted by physics topic) Data Handling and Computation for Physics Analysis event filter (selection & reconstruction) event filter (selection & reconstruction) processed data 26 June 20098Ian Bird, CERN

9 The LHC Computing Challenge  Signal/Noise: (10 -9 offline)  Data volume High rate * large number of channels * 4 experiments  15 PetaBytes of new data each year  Compute power Event complexity * Nb. events * thousands users  200 k CPUs  45 PB of disk storage  Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere  GRID technology  22 PB in 2011  150 PB  250 k CPU

10 A collision at LHC 26 June 2009Ian Bird, CERN

11 The Data Acquisition 26 June 2009

Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution 1.25 GB/sec (ions) : MB/sec 2011: 4-6 GB/sec

A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Managed and operated by a worldwide collaboration between the experiments and the participating computer centres The resources are distributed – for funding and sociological reasons Our task was to make use of the resources available to us – no matter where they are located Ian Bird, CERN13 WLCG – what and why? Tier-0 (CERN): Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (~130 centres): Simulation End-user analysis

e-Infrastructure

Tier 0 Tier 1 Tier 2 WLCG Grid Sites Today >140 sites >250k CPU cores >150 PB disk Today >140 sites >250k CPU cores >150 PB disk

Lyon/CCIN2P3 Barcelona/PIC De-FZK US-FNAL Ca- TRIUMF NDGF CERN US-BNL UK-RAL Taipei/ASGC Ian Bird, CERN1626 June 2009 Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. WLCG Collaboration Status Tier 0; 11 Tier 1s; 68 Tier 2 federations WLCG Collaboration Status Tier 0; 11 Tier 1s; 68 Tier 2 federations Amsterdam/NIKHEF-SARA Bologna/CNAF

Original Computing model

Ian Bird, CERN18 From testing to data: Independent Experiment Data Challenges Service Challenges proposed in 2004 To demonstrate service aspects: -Data transfers for weeks on end -Data management -Scaling of job workloads -Security incidents (“fire drills”) -Interoperability -Support processes Service Challenges proposed in 2004 To demonstrate service aspects: -Data transfers for weeks on end -Data management -Scaling of job workloads -Security incidents (“fire drills”) -Interoperability -Support processes SC1 Basic transfer rates SC2 Basic transfer rates SC3 Sustained rates, data management, service reliability SC4 Nominal LHC rates, disk  tape tests, all Tier 1s, some Tier 2s CCRC’08 Readiness challenge, all experiments, ~full computing models STEP’09 Scale challenge, all experiments, full computing models, tape recall + analysis Focus on real and continuous production use of the service over several years (simulations since 2003, cosmic ray data, etc.) Data and Service challenges to exercise all aspects of the service – not just for data transfers, but workloads, support structures etc. Focus on real and continuous production use of the service over several years (simulations since 2003, cosmic ray data, etc.) Data and Service challenges to exercise all aspects of the service – not just for data transfers, but workloads, support structures etc. e.g. DC04 (ALICE, CMS, LHCb)/DC2 (ATLAS) in 2004 saw first full chain of computing models on grids

In ~38 PB of data have been accumulated, expect about 30 PB more in 2012 Data rates to tape in excess of original plans : up to 6 GB/s in HI running (cf. nominal 1.25 GB/s) WLCG: Data in 2010;11;12 HI: ALICE data into Castor > 4 GB/s (red) HI: Overall rates to tape > 6 GB/s (r+b) 23 PB data written in 2011 …and 2012, 3 PB/month

Large numbers of analysis users:  ATLAS, CMS ~1000  LHCb,ALICE ~250 Large numbers of analysis users:  ATLAS, CMS ~1000  LHCb,ALICE ~250 Use remains consistently high:  >1.5 M jobs/day;  ~150k CPU Use remains consistently high:  >1.5 M jobs/day;  ~150k CPU Grid Usage As well as LHC data, large simulation productions always ongoing CPU used at Tier 1s + Tier 2s (HS06.hrs/month) – last 24 months At the end of 2010 we saw all Tier 1 and Tier 2 job slots being filled CPU usage now >> double that of mid (inset shows build up over previous years) In 2011 WLCG delivered ~ 150 CPU-millennia! In 2011 WLCG delivered ~ 150 CPU-millennia! 1.5M jobs/day 10 9 HEPSPEC-hours/month (~150 k CPU continuous use) 10 9 HEPSPEC-hours/month (~150 k CPU continuous use)

Tiers usage vs pledges We use everything we are given!

The grid really works All sites, large and small can contribute – And their contributions are needed! The grid really works All sites, large and small can contribute – And their contributions are needed! CPU – around the Tiers

Data transfers Global transfers > 10 GB/s (1 day) Global transfers (last month) CERN  Tier 1s (last 2 weeks)

Relies on – OPN, GEANT, US-LHCNet – NRENs & other national & international providers Ian Bird, CERN24 LHC Networking

e-Infrastructure

Data Management Services Job Management ServicesSecurity Services Information Services Certificate Management Service VO Membership Service Authentication Service Authorization Service Information System Messaging Service Site Availability Monitor Accountin g Service Monitoring tools: experiment dashboards; site monitoring Storage Element File Catalogue Service File Transfer Service Grid file access tools GridFTP service Database and DB Replication Services POOL Object Persistency Service Compute Element Workload Management Service VO Agent Service Application Software Install Service Today’s Grid Services Experiments invested considerable effort into integrating their software with grid services; and hiding complexity from users

Consider that: Computing models have evolved Far better understanding of requirements now than 10 years ago – Even evolved since large scale challenges Experiments have developed various workarounds to manage shortcomings in middleware Pilot jobs and central task queues (almost) ubiquitous Operational effort often too high; – lots of services were not designed for redundancy, fail-over, etc. Technology evolves rapidly, rest of world also does (large scale) distributed computing – don’t need entirely home grown solutions Must be concerned about long term support and where it will come from Technical evolution: Background

Computing model evolution Evolution of computing models Hierarchy Mesh

Not just bandwidth We are a Global collaboration … but well connected countries do better Not just bandwidth We are a Global collaboration … but well connected countries do better Connectivity challenge Need to effectively connect everyone that wants to participate in LHC science Large actual and potential communities in Middle East, Africa, Asia, Latin America … but also on the edges of Europe Need to effectively connect everyone that wants to participate in LHC science Large actual and potential communities in Middle East, Africa, Asia, Latin America … but also on the edges of Europe

WLCG has been leveraged on both sides of the Atlantic, to benefit the wider scientific community – Europe: Enabling Grids for E-sciencE (EGEE) European Grid Infrastructure (EGI) – USA: Open Science Grid (OSG) (+ extension?) Many scientific applications  30 Impact of the LHC Computing Grid Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences …

Spectrum of grids, clouds, supercomputers, etc. 31 Grids Collaborative environment Distributed resources (political/sociological) Commodity hardware (also supercomputers) (HEP) data management Complex interfaces (bug not feature) Supercomputers Expensive Low latency interconnects Applications peer reviewed Parallel/coupled applications Traditional interfaces (login) Also SC grids (DEISA, Teragrid) Clouds Proprietary (implementation) Economies of scale in management Commodity hardware Virtualisation for service provision and encapsulating application environment Details of physical resources hidden Simple interfaces (too simple?) Volunteer computing Simple mechanism to access millions CPUs Difficult if (much) data involved Control of environment  check Community building – people involved in Science Potential for huge amounts of real work Many different problems: Amenable to different solutions No right answer Many different problems: Amenable to different solutions No right answer Consider ALL as a combined e-Infrastructure ecosystem Aim for interoperability and combine the resources into a consistent whole Keep applications agile so they can operate in many environments

Grid: Is a distributed computing service – Integrates distributed resources – Global single-sign-on (use same credential everywhere) – Enables (virtual) collaboration Cloud: Is a large (remote) data centre – Economy of scale – centralize resources in large centres – Virtualisation – enables dynamic provisioning of resources Technologies are not exclusive – In the future our collaborative grid sites will use cloud technologies (virtualisation etc) – We will also use cloud resources to supplement our own Ian Bird, CERN32 Grid Cloud?? 26 June 2009

We have a grid because: – We need to collaborate and share resources – Thus we will always have a “grid” – Our network of trust is of enormous value for us and for (e- )science in general We also need distributed data management – That supports very high data rates and throughputs – We will continually work on these tools But, the rest can be more mainstream (open source, commercial, … ) – We use message brokers more and more as inter-process communication – Virtualisation of our grid sites is happening many drivers: power, dependencies, provisioning, … – Remote job submission … could be cloud-like – Interest in making use of commercial cloud resources, especially for peak demand Grids  clouds??

Several strategies: Use of virtualisation in the CERN & other CCs: – Lxcloud pilot + CVI  dynamic virtualised infrastructure (which may include “bare-metal” provisioning) – No change to any grid or service interfaces (but new possibilities) – Likely based on Openstack – Other WLCG sites also virtualising their infrastructure Investigating use of commercial clouds – “bursting” – Additional resources; – Potential of outsourcing some services? – Prototype with Helix Nebula project; – Experiments have various activities (with Amazon, etc) Can cloud technology replace/supplement some grid services? – More speculative: Feasibility? Timescales? Ian Bird, CERN34 Clouds & Virtualisation

CERN Infrastructure Evolution35 CERN Data Centre Numbers Systems7,899Hard disks62,023 Processors14,972Raw disk capacity (TiB)62,660 Cores64,623Tape capacity (PiB)47 Memory (TiB)165Ethernet 1Gb ports16,773 Racks1,070Ethernet 10Gb ports622 Power Consumption (KiW)2,345 From

Evolution of capacity: CERN & WLCG