Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities.

Slides:



Advertisements
Similar presentations
Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Jerzy Szwed Under Secretary of State Ministry of Science and Higher.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
Project Status Report Ian Bird Computing Resource Review Board 30 th October 2012 CERN-RRB
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
Massive Computing at CERN and lessons learnt
LCG Milestones for Deployment, Fabric, & Grid Technology Ian Bird LCG Deployment Area Manager PEB 3-Dec-2002.
LHC: An example of a Global Scientific Community Sergio Bertolucci CERN 5 th EGEE User Forum Uppsala, 14 th April 2010.
Resources and Financial Plan Sue Foffano WLCG Resource Manager C-RRB Meeting, 12 th October 2010.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Emmanuel Tsesmelis 2 nd CERN School Thailand 2012 Suranaree University of Technology.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
Frédéric Hemmer, CERN, IT Department The LHC Computing Grid – June 2006 The LHC Computing Grid Visit of the Comité d’avis pour les questions Scientifiques.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
Petabyte-scale computing for LHC Ian Bird, CERN WLCG Project Leader WLCG Project Leader ISEF Students 18 th June 2012 Accelerating Science and Innovation.
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE – paving the way for a sustainable infrastructure.
INFSO-RI Enabling Grids for E-sciencE Plan until the end of the project and beyond, sustainability plans Dieter Kranzlmüller Deputy.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE project’s status and future Bob.
Progress in Computing Ian Bird ICHEP th July 2010, Paris
1 The LHC Computing Grid – February 2007 Frédéric Hemmer, CERN, IT Department LHC Computing and Grids Frédéric Hemmer Deputy IT Department Head January.
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Service, Operations and Support Infrastructures in HEP Processing the Data from the World’s Largest Scientific Machine Patricia Méndez Lorenzo (IT-GS/EIS),
Ian Bird LCG Project Leader WLCG Update 6 th May, 2008 HEPiX – Spring 2008 CERN.
Dr. Andreas Wagner Deputy Group Leader - Operating Systems and Infrastructure Services CERN IT Department The IT Department & The LHC Computing Grid –
The LHC Computing Grid – February 2008 CERN’s Integration and Certification Services for a Multinational Computing Infrastructure with Independent Developers.
LHC Computing, CERN, & Federated Identities
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Frédéric Hemmer IT Department 26 th January 2010 Visit of Michael Dell 1 Frédéric.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
The LHC Computing Grid Visit of Dr. John Marburger
WLCG Worldwide LHC Computing Grid Markus Schulz CERN-IT-GT August 2010 Openlab Summer Students.
1 The LHC Computing Grid – April 2007 Frédéric Hemmer, CERN, IT Department The LHC Computing Grid A World-Wide Computer Centre Frédéric Hemmer Deputy IT.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Ian Bird LCG Project Leader WLCG Status Report CERN-RRB th April, 2008 Computing Resource Review Board.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
WLCG: The 1 st year with data & looking to the future WLCG: Ian Bird, CERN WLCG Project Leader WLCG Project LeaderLCG-France; Strasbourg; 30 th May 2011.
Ian Bird LCG Project Leader WLCG Status Report 7 th May, 2008 LHCC Open Session.
Jürgen Knobloch/CERN Slide 1 Grid Computing by Jürgen Knobloch CERN IT-Department Presented at Physics at the Terascale DESY, Hamburg December 4, 2007.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE project and the future of European.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
2 nd EGEE/OSG Workshop Data Management in Production Grids 2 nd of series of EGEE/OSG workshops – 1 st on security at HPDC 2006 (Paris) Goal: open discussion.
The CMS Experiment at LHC
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Grid Computing in HIGH ENERGY Physics
Physics Data Management at CERN
Kors Bos NIKHEF, Amsterdam.
IT Department and The LHC Computing Grid
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
Long-term Grid Sustainability
The LHC Computing Challenge
EGEE support for HEP and other applications
Visit of US House of Representatives Committee on Appropriations
LHC Data Analysis using a worldwide computing grid
Cécile Germain-Renaud Grid Observatory meeting 19 October 2007 Orsay
The LHC Computing Grid Visit of Prof. Friedrich Wagner
Overview & Status Al-Ain, UAE November 2007.
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities

The scales 2

Chambres à muons Calorimètre - High Energy Physics machines and detectors √s=14 TeV L : /cm 2 /s L: /cm 2 /s 2,5 million collisions per second  LVL1: 10 KHz, LVL3: Hz  25 MB/sec digitized recording 40 million collisions per second  LVL1: 1 kHz, LVL3: 100 Hz  0.1 to 1 GB/sec digitized recording 3

LHC: 4 experiments … ready! First physics expected in autumn

The LHC Computing Challenge 5  Signal/Noise:  Data volume High rate * large number of channels * 4 experiments  15 PetaBytes of new data each year  Compute power Event complexity * Nb. events * thousands users  100 k of (today's) fastest CPUs  Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere  GRID technology

A collision at LHC Luminosity : cm -2 s MHz – every 25 ns 20 events overlaying 6

The Data Acquisition 7

Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution 1.25 GB/sec (ions) 8

Tier 0 – Tier 1 – Tier 2 9 Tier-0 (CERN): Data recording First-pass reconstruction Data distribution Tier-1 (11 centres): Permanent storage Re-processing Analysis Tier-2 (>200 centres): Simulation End-user analysis

Evolution of requirements LHC approved ATLAS & CMS approved ALICE approved LHCb approved “Hoffmann” Review 7x10 7 MIPS 1,900 TB disk ATLAS (or CMS) requirements for first year at design luminosity ATLAS&CMS CTP 10 7 MIPS 100 TB disk LHC start Computing TDRs 55x10 7 MIPS 70,000 TB disk (140 MSi2K)

Evolution of CPU Capacity at CERN SC (0.6GeV) PS (28GeV) ISR (300GeV) SPS (400GeV) ppbar (540GeV) LEP (100GeV) LEP II (200GeV) LHC (14 TeV) Costs (2007 Swiss Francs) Includes infrastructure costs (comp.centre, power, cooling,..) and physics tapes Tape & disk requirements: >10 times CERN possibility

Evolution of Grids 12 Data Challenges First physics GRID 3 EGEE 1 LCG 1 EU DataGrid GriPhyN, iVDGL, PPDG EGEE 2 OSG LCG 2 EGEE WLCG Service Challenges Cosmics

The Worldwide LHC Computing Grid  Purpose  Develop, build and maintain a distributed computing environment for the storage and analysis of data from the four LHC experiments  Ensure the computing service  … and common application libraries and tools  Phase I – Development & planning  Phase II – – Deployment & commissioning of the initial services

WLCG Collaboration  The Collaboration 4 LHC experiments ~250 computing centres 12 large centres (Tier-0, Tier-1) 56 federations of smaller “Tier-2” centres Growing to ~40 countries Grids: EGEE, OSG, Nordugrid  Technical Design Reports WLCG, 4 Experiments: June 2005  Memorandum of Understanding Agreed in October 2005  Resources 5-year forward look 14 Tier 1 – all have now signed Tier 2: Tier 1 – all have now signed Tier 2: Australia Belgium Canada * China Czech Rep. * Denmark Estonia Finland France Germany (*) Hungary * Italy India Israel Japan JINR Korea Netherlands Norway * Pakistan Poland Potugal Romania Russia Slovenia Spain Sweden * Switzerland Taipei Turkey * UK Ukraine USA Still to sign: Austria Brazil (under discussion) * Recent additions MoU Signing Status

WLCG Service Hierarchy 15 Tier-0 – the accelerator centre  Data acquisition & initial processing  Long-term data curation  Distribution of data  Tier-1 centres Tier-1 – “online” to the data acquisition process  high availability  Managed Mass Storage –  grid-enabled data service  Data-heavy analysis  National, regional support Tier-2: ~130 centres in ~35 countries  End-user (physicist, research group) analysis – where the discoveries are made  Simulation Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY)

Recent grid use Tier 2: 54% CERN: 11% Tier 1: 35%  Across all grid infrastructures EGEE, OSG, Nordugrid  The grid concept really works – all contributions – large & small are essential!

Recent grid activity  These workloads (reported across all WLCG centres) are at the level anticipated for 2008 data taking 230k /day  WLCG ran ~ 44 M jobs in 2007 – workload has continued to increase  29M in 2008 – now at ~ >300k jobs/day  Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system  Tier 2 contribution is around 50%; > 85% is external to CERN 300k /day

LHCOPN Architecture 18

Data Transfer out of Tier-0 19 Target: 2008/ GB/s

Production Grids  WLCG relies on a production quality infrastructure Requires standards of: ○ Availability/reliability ○ Performance ○ Manageability Will be used 365 days a year... (has been for several years!) Tier 1s must store the data for at least the lifetime of the LHC - ~20 years ○ Not passive – requires active migration to newer media  Vital that we build a fault-tolerant and reliable system That can deal with individual sites being down and recover 20

The EGEE Production Infrastructure Production Service Pre-production service Certification test-beds (SA3) Test-beds & Services Operations Coordination Centre Regional Operations Centres Global Grid User Support EGEE Network Operations Centre (SA2) Operational Security Coordination Team Operations Advisory Group (+NA4) Joint Security Policy GroupEuGridPMA (& IGTF) Grid Security Vulnerability Group Security & Policy Groups Support Structures & Processes Training infrastructure (NA4) Training activities (NA3)

Site Reliability 22 Sep 07Oct 07Nov 07Dec 07Jan 08Feb 08 All89%86%92%87%89%84% 8 best93% 95% 96% Above target (+>90% target)

Improving Reliability  Monitoring  Metrics  Workshops  Data challenges  Experience  Systematic problem analysis  Priority from software developers

Gridmap

Middleware: Baseline Services  Storage Element Castor, dCache, DPM Storm added in 2007 SRM 2.2 – deployed in production – Dec 2007  Basic transfer tools – Gridftp,..  File Transfer Service (FTS)  LCG File Catalog (LFC)  LCG data mgt tools - lcg-utils  Posix I/O – Grid File Access Library (GFAL)  Synchronised databases T0  T1s 3D project  Information System Scalability improvements  Compute Elements Globus/Condor-C – improvements to LCG-CE for scale/reliability web services (CREAM) Support for multi-user pilot jobs (glexec, SCAS)  gLite Workload Management in production  VO Management System (VOMS)  VO Boxes  Application software installation  Job Monitoring Tools The Basic Baseline Services – from the TDR (2005) Focus now on continuing evolution of reliability, performance, functionality, requirements For a production grid the middleware must allow us to build fault-tolerant and scalable services: this is more important than sophisticated functionality

Database replication  In full production Several GB/day user data can be sustained to all Tier 1s  ~100 DB nodes at CERN and several 10’s of nodes at Tier 1 sites Very large distributed database deployment  Used for several applications Experiment calibration data; replicating (central, read-only) file catalogues

LCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid 27 Interoperability & interoperation is vital significant effort in building the procedures to support it

Enabling Grids for E-sciencE EGEE-II INFSO-RI sites 45 countries 45,000 CPUs 12 PetaBytes > 5000 users > 100 VOs > 100,000 jobs/day Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … Grid infrastructure project co-funded by the European Commission - now in 2 nd phase with 91 partners in 32 countries

EGEE: Increasing workloads ⅓ non-LHC

Grid Applications 30 Medical Seismology Chemistry Astronomy Fusion Particle Physics

Share of EGEE resources 31 5/07 – 4/08: 45 Million jobs HEP

HEP use of EGEE: May 07 – Apr 08 32

The next step

Sustainability: Beyond EGEE-II  Need to prepare permanent, common Grid infrastructure  Ensure the long-term sustainability of the European e- infrastructure independent of short project funding cycles  Coordinate the integration and interaction between National Grid Infrastructures (NGIs)  Operate the European level of the production Grid infrastructure for a wide range of scientific disciplines to link NGIs

EGI – European Grid Initiative EGI Design Study proposal to the European Commission (started Sept 07) Supported by 37 National Grid Initiatives (NGIs) 2 year project to prepare the setup and operation of a new organizational model for a sustainable pan-European grid infrastructure after the end of EGEE-3

Summary  We have an operating production quality grid infrastructure that: Is in continuous use by all 4 experiments (and many other applications); Is still growing in size – sites, resources (and still to finish ramp up for LHC start-up); Demonstrates interoperability (and interoperation!) between 3 different grid infrastructures (EGEE, OSG, Nordugrid); Is becoming more and more reliable; Is ready for LHC start up  For the future we must: Learn how to reduce the effort required for operation; Tackle upcoming issues of infrastructure (e.g. Power, cooling); Manage migration of underlying infrastructures to longer term models; Be ready to adapt the WLCG service to new ways of doing distributed computing 36