WLCG Worldwide LHC Computing Grid Markus Schulz CERN-IT-GT August 2010 Openlab Summer Students.

Slides:



Advertisements
Similar presentations
CERN STAR TAP June 2001 Status of the EU DataGrid Project Fabrizio Gagliardi CERN EU-DataGrid Project Leader June 2001
Advertisements

EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Why Grids Matter to Europe Bob Jones EGEE.
Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
Welcome to CERN Research Technology Training Collaborating.
Project Status Report Ian Bird Computing Resource Review Board 30 th October 2012 CERN-RRB
13 October 2014 Eric Grancher, head of database services, CERN IT Manuel Martin Marquez, data scientist, CERN openlab.
GridPP Steve Lloyd, Chair of the GridPP Collaboration Board.
Massive Computing at CERN and lessons learnt
LHC: An example of a Global Scientific Community Sergio Bertolucci CERN 5 th EGEE User Forum Uppsala, 14 th April 2010.
Resources and Financial Plan Sue Foffano WLCG Resource Manager C-RRB Meeting, 12 th October 2010.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE Operations Ian Bird, CERN IT/GD LHCC.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Emmanuel Tsesmelis 2 nd CERN School Thailand 2012 Suranaree University of Technology.
What is EGI? The European Grid Infrastructure enables access to computing resources for European scientists from all fields of science, from Physics to.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Advanced Computing Services for Research Organisations Bob Jones Head of openlab IT dept CERN This document produced by Members of the Helix Nebula consortium.
13-May-03D.P.Kelsey, WP8 CA and VO organistion1 CA’s and Experiment (VO) Organisation WP8 Meeting EDG Barcelona, 13 May 2003 David Kelsey CCLRC/RAL, UK.
INFSO-RI Enabling Grids for E-sciencE SA1: Cookbook (DSA1.7) Ian Bird CERN 18 January 2006.
GGF12 – 20 Sept LCG Incident Response Ian Neilson LCG Security Officer Grid Deployment Group CERN.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
Petabyte-scale computing for LHC Ian Bird, CERN WLCG Project Leader WLCG Project Leader ISEF Students 18 th June 2012 Accelerating Science and Innovation.
Rackspace Analyst Event Tim Bell
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE – paving the way for a sustainable infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE project’s status and future Bob.
Progress in Computing Ian Bird ICHEP th July 2010, Paris
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Ian Bird LCG Deployment Area Manager & EGEE Operations Manager IT Department, CERN Presentation to HEPiX 22 nd October 2004 LCG Operations.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks SA1: Grid Operations Maite Barroso (CERN)
INFSO-RI Enabling Grids for E-sciencE EGEE SA1 in EGEE-II – Overview Ian Bird IT Department CERN, Switzerland EGEE.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The EGEE User Support Infrastructure Torsten.
WelcomeWelcome CSEM – CERN Day 23 rd May 2013 CSEM – CERN Day 23 rd May 2013 to Accelerating Science and Innovation to Accelerating Science and Innovation.
CERN as a World Laboratory: From a European Organization to a global facility CERN openlab Board of Sponsors July 2, 2010 Rüdiger Voss CERN Physics Department.
Status Organization Overview of Program of Work Education, Training It’s the People who make it happen & make it Work.
EGI-InSPIRE Steven Newhouse Interim EGI.eu Director EGI-InSPIRE Project Director Technical Director EGEE-III 1GDB - December 2009.
INFSO-RI Enabling Grids for E-sciencE An overview of EGEE operations & support procedures Jules Wolfrat SARA.
LHC Computing, CERN, & Federated Identities
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Frédéric Hemmer IT Department 26 th January 2010 Visit of Michael Dell 1 Frédéric.
INFSO-RI Enabling Grids for E-sciencE The EGEE Project Owen Appleton EGEE Dissemination Officer CERN, Switzerland Danish Grid Forum.
LHC Computing, SPC-FC-CC-C; H F Hoffmann1 CERN/2379/Rev: Proposal for building the LHC computing environment at CERN (Phase 1) Goals of Phase.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 1 st March 2011 Visit of Dr Manuel Eduardo Baldeón.
CERN openlab Overview CERN openlab Summer Students 2015 Fons Rademakers.
The Mission of CERN  Push back  Push back the frontiers of knowledge E.g. the secrets of the Big Bang …what was the matter like within the first moments.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks EGEE: Enabling grids for E-Science Bob Jones.
25-September-2005 Manjit Dosanjh Welcome to CERN International Workshop on African Research & Education Networking September ITU, UNU and CERN.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Technical Overview EGEE-II’s achievements.
WLCG after 1 year with data: Prospects for the future Ian Bird; WLCG Project Leader openlab BoS meeting CERN4 th May 2011.
WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.
Ian Bird LCG Project Leader Status of EGEE  EGI transition WLCG LHCC Referees’ meeting 21 st September 2009.
WLCG: The 1 st year with data & looking to the future WLCG: Ian Bird, CERN WLCG Project Leader WLCG Project LeaderLCG-France; Strasbourg; 30 th May 2011.
INFSO-RI Enabling Grids for E-sciencE EGEE general project update Fotis Karayannis EGEE South East Europe Project Management Board.
WLCG – Status and Plans Ian Bird WLCG Project Leader openlab Board of Sponsors CERN, 23 rd April 2010.
EGI-InSPIRE EGI-InSPIRE RI The European Grid Infrastructure Steven Newhouse Director, EGI.eu Project Director, EGI-InSPIRE 29/06/2016CoreGrid.
Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities.
Bob Jones EGEE Technical Director
The 5 minutes tour of CERN The 5 minutes race of CERN
Grid Computing in HIGH ENERGY Physics
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
LCG Security Status and Issues
Ian Bird GDB Meeting CERN 9 September 2003
Long-term Grid Sustainability
EGEE support for HEP and other applications
Connecting the European Grid Infrastructure to Research Communities
Cécile Germain-Renaud Grid Observatory meeting 19 October 2007 Orsay
Overview & Status Al-Ain, UAE November 2007.
Presentation transcript:

WLCG Worldwide LHC Computing Grid Markus Schulz CERN-IT-GT August 2010 Openlab Summer Students

Markus Schulz Overview Background The Infrastructure Usage Evolution 2

Markus Schulz Not covered today Grid Computing Technology gLite Middleware 3

4 CERN in Numbers 2328 staff* 711 Fellows and Associates* 9535 users* Budget (2009) 887 MCHF (595M€) *6 January 2009 Member States: Austria, Belgium, Bulgaria, the Czech Republic, Denmark, Finland, France, Germany, Greece, Hungary, Italy, Netherlands, Norway, Poland, Portugal, Slovakia, Spain, Sweden, Switzerland and the United Kingdom. Observers to Council: India, Israel, Japan, the Russian Federation, the United States of America, Turkey, the European Commission and Unesco 8 October 2009Markus Schulz, CERN

5 8 October 2009Markus Schulz, CERN

One of our data generators: ATLAS 150 million sensors deliver data … … 40 million times per second

Markus Schulz, CERN GB/sec (ions) Tier 0 at CERN: Acquisition, First pass processing Storage & Distribution 8 October 2009

Flow in and out of the center Markus Schulz, CERN, IT Department

9 Data and Algorithms HEP data are organized as Events (particle collisions) Simulation, Reconstruction and Analysis programs process “one Event at a time” – Events are fairly independent  Trivial parallel processing Event processing programs are composed of a number of Algorithms selecting and transforming “raw” Event data into “processed” (reconstructed) Event data and statistics 8 October 2009 Markus Schulz, CERN RAW Detector digitisation ~2 MB/event ESD/RECO Pseudo-physical information: Clusters, track candidates ~100 kB/event AOD ~10 kB/event TAG ~1 kB/event Relevant information for fast event selection Triggered events recorded by DAQ Reconstructed information Analysis information Classification information Physical information: Transverse momentum, Association of particles, jets, id of particles

10 pp collisions at 14 TeV at cm -2 s -1 How to extract this: (Higgs  4 muons) From this: With: 20 proton-proton collisions overlap And this repeats every 25 ns… A very difficult environment … Z  at LEP (e+e-) 8 October 2009 Markus Schulz, CERN

The LHC Computing Challenge 11  Signal/Noise: (10 -9 offline)  Data volume High rate * large number of channels * 4 experiments  15 Peta Bytes of new data each year  Compute power Event complexity * Nb. events * thousands users  200 k of (today's) fastest CPUs  45 PB of disk storage  Worldwide analysis & funding Computing funding locally in major regions & countries Efficient analysis everywhere  GRID technology 26 June 2009 Ian Bird, CERN

So what is a Grid? Markus Schulz, CERN12 Grid CloudGrid Super- computer Volunteer “Cost” or application complexity “Collaborativeness” (# sites) Enterprise Grid 1-few10’s – 100’s10000’s – M’s Grid 8 October 2009 Used by Virtual Organisations: thematic groups crossing administrative and geographical boundaries Collaborative environment, crossing many administrative boundaries; not subject to central control WLCG federates 3 grids ( EGEE,OSG,NDGF)

What is Grid Middleware? For today: The glue that creates the illusion that a distributed infrastructure is a single resource If it works, no one will notice it Markus Schulz, CERN, IT Department

Tier 0 – Tier 1 – Tier 2 the Service Hierarchy Markus Schulz, CERN14 Tier-0 (CERN): (15%) Data recording Initial data reconstruction Data distribution Tier-1 (11 centres): (40%) Permanent storage Re-processing Analysis Tier-2 (~200 centres): (45%) Simulation End-user analysis 8 October % 40% 45% Date

Lyon/CCIN2P3 Barcelona/PIC De-FZK US-FNAL Ca- TRIUMF NDGF CERN US-BNL UK-RAL Taipei/ASGC 26 June 2009Ian Bird, CERN15 Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. Today we have 49 MoU signatories, representing 34 countries: Australia, Austria, Belgium, Brazil, Canada, China, Czech Rep, Denmark, Estonia, Finland, France, Germany, Hungary, Italy, India, Israel, Japan, Rep. Korea, Netherlands, Norway, Pakistan, Poland, Portugal, Romania, Russia, Slovenia, Spain, Sweden, Switzerland, Taipei, Turkey, UK, Ukraine, USA. WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations (124 Tier 2 sites) WLCG Collaboration Status Tier 0; 11 Tier 1s; 64 Tier 2 federations (124 Tier 2 sites) Amsterdam/NIKHEF-SARA Bologna/CNAF

History MONARC project – First LHC computing architecture – hierarchical distributed model 2000 – growing interest in grid technology – HEP community main driver in launching the DataGrid project EU DataGrid project – middleware & testbed for an operational grid – LHC Computing Grid – LCG – deploying the results of DataGrid to provide a production facility for LHC experiments – EU EGEE project phase 1 – starts from the LCG grid – shared production infrastructure – expanding to other communities and sciences – EU EGEE project phase 2 – expanding to other communities and sciences – Scale and stability – Interoperations/Interoperability – EU EGEE project phase 3 – More communities – Efficient operations – Less central coordination 2010 – 201x EGI and EMI – Sustainable infrastructures based on National Grid Infrastructures – Decoupling of middleware development and infrastructure CERN

Evolution of (W)LCG 17 Data Challenges First physics GRID 3 EGEE 1 LCG 1 EU DataGrid GriPhyN, iVDGL, PPDG EGEE 2 OSG LCG 2 EGEE WLCG Cosmics Service Challenges 8 October 2009 Markus Schulz, CERN LHC approved ATLAS & CMS approved ALICE approved LHCb approved “Hoffmann” Review 7x10 7 MIPS 1,900 TB disk ATLAS&CMS CTP 10 7 MIPS 100 TB disk Computing TDRs 55x10 7 MIPS (140 MSi2k) 70,000 TB disk Ever increasing requirements

Production Grids WLCG relies on a production quality infrastructure – Requires standards of: Availability/reliability Performance Manageability – Is used 365 days a year... (has been for several years!) – Tier 1s must store the data for at least the lifetime of the LHC - ~20 years Not passive – requires active migration to newer media Vital that we build a fault-tolerant and reliable system – That can deal with individual sites being down and recover 18

FZK FNAL TRIUMF NGDF CERN Barcelona/PIC Lyon/CCIN2P3 Bologna/CAF Amsterdam/NIKHEF-SARA BNL RAL Taipei/ASGC

Markus Schulz What is needed to make it work? Apart from Middleware Apart from Computer Centers 20

Markus Schulz Everything you need in a Computer Center! Management Fabric Networking Security Monitoring User Support Problem Tracking Accounting Service support SLAs….. But now on a global scale – Respecting the sites’ independence – Linking the different infrastructures NDGF, EGEE (EGI), OSG 21

Management Board Management of the Project Architects Forum Coordination of Common Applications Grid Deployment Board Coordination of Grid Operations Overview Board - OB Collaboration Board – CB Experiments and Regional Centres Collaboration Board – CB Experiments and Regional Centres LHC Committee – LHCC Scientific Review LHC Committee – LHCC Scientific Review Computing Resources Review Board – C-RRB Funding Agencies Computing Resources Review Board – C-RRB Funding Agencies Physics Applications Software Service & Support Grid Deployment Computing Fabric Activity Areas Worldwide LCG Organisation Resource Scrutiny Group – C-RSG EGEE, OSG representation

The EGEE Infrastructure Markus Schulz, CERN, IT Department Production Service Pre-production service Certification test-beds (SA3) Test-beds & Services Operations Coordination Centre Regional Operations Centres Global Grid User Support EGEE Network Operations Centre (SA2) Operational Security Coordination Team Operations Advisory Group (+NA4) Joint Security Policy GroupEuGridPMA (& IGTF) Grid Security Vulnerability Group Security & Policy Groups Support Structures & Processes Training infrastructure (NA4) Training activities (NA3)

Operations Infrastructure Regional Operations Centers (ROCs) – Little central coordination, policy driven – CIC portal as a common information point Connect to users and admins – Problem tracking tools GGUS ( Global Grid User Support) – Web interface, mail interface – Handles ~1000 tickets every month – Interfaced to local tools -  acceptance Accounting: APEL – Central DB + Portal (UK & Spain) – Interfaces for other systems: DGAS, OSG, NDGF

Daily WLCG Operations Meetings – 30 minutes – Follow up on current problems Every two weeks WLCG T1 Service Coordination meeting – Operational Planning – Incidents followup Detailed monitoring of the SLAs. Markus Schulz In addition to EGEE/EGI Operations 25

GGUS: Web based portal – about 1000 tickets per months – Grid security aware – Interfaces to regional/national support structures Markus Schulz Global Grid User Support 26

Reliabilities This is not the full picture: Experiment-specific measures give complementary view Need to be used together with some understanding of underlying issues

Availability metrics - GridView

Monitoring to Improve Reliability Monitoring Metrics Workshops Data challenges Experience Systematic problem analysis Priority from software developers

EGEE-II INFSO-RI LHCC Comprehensive Review; November 2007 GridMap Visualization Metric selection for colour of rectangles Show SAM status Show GridView availability data Grid topology view (grouping) Metric selection for size of rectangles VO selection Overall Site or Site Service selection Link: Drilldown into region by clicking on the title Context sensitive information Colour Key Description of current view

Grid Monitoring The critical activity to achieve reliability System Management Fabric management Best Practices Security ……. Grid Services Grid sensors Transport Repositories Views ……. System Analysis Application monitoring …… “… To help improve the reliability of the grid infrastructure …” “ … provide stakeholders with views of the infrastructure allowing them to understand the current and historical status of the service …” “ … to gain understanding of application failures in the grid environment and to provide an application view of the state of the infrastructure …” “ … improving system management practices, Provide site manager input to requirements on grid monitoring and management tools Propose existing tools to the grid monitoring working group Produce a Grid Site Fabric Management cook-book Identify training needs

Monitoring Availability/Reliability monitoring – SAM tests and infrastructure Now migrated to NAGIOS based system, decentralized – Visualization: GridView, GridMap, dashboards – Solid foundation: Monitoring Infrastructure

Markus Schulz, CERN, IT Department Security & Policy Collaborative policy development – Many policy aspects are collaborative works; e.g.: Joint Security Policy Group Certification Authorities – EUGridPMA  IGTF, etc. Grid Acceptable Use Policy (AUP) – common, general and simple AUP – for all VO members using many Grid infrastructures EGEE, OSG, SEE-GRID, DEISA, national Grids… Incident Handling and Response – defines basic communications paths – defines requirements (MUSTs) for IR – not to replace or interfere with local response plans Security & Availability Policy Security & Availability Policy Usage Rules Certification Authorities Audit Requirements Incident Response User Registration & VO Management Application Development & Network Admin Guide VO Security Operations Advisory Group (+NA4) Joint Security Policy GroupEuGridPMA (& IGTF) Grid Security Vulnerability Group Security & Policy Groups

Security groups Joint Security Policy Group: – Joint with WLCG, OSG, and others – Focus on policy issues – Strong input to e-IRG EUGridPMA – Pan-European trust federation of CAs – Included in IGTF (and was model for it) – Success: most grid projects now subscribe to the IGTF Grid Security Vulnerability Group – Looking at how to manage vulnerabilities – Risk analysis is fundamental – Hard to balance between openness and giving away insider info Operational Security Coordination Team – Main day-to-day operational security work – Incident response and follow up – Members in all ROCs and sites – Frequent tests (Security Challenges) TAGPMA APGridPMA The Americas Grid PMA European Grid PMA EUGridPMA Asia- Pacific Grid PMA

The new European Network Backbone LCG working group with Tier-1s and national/ regional research network organisations New GÉANT 2 – research network backbone  Strong correlation with major European LHC centres (Swiss PoP at CERN)  Core links are fibre Two 622 Mbps circuits to Israel

Markus Schulz Usage 36

Archeology Astronomy Astrophysics Civil Protection Comp. Chemistry Earth Sciences Finance Fusion Geophysics High Energy Physics Life Sciences Multimedia Material Sciences … >340sites 48 countries >170,000 CPUs >25 PetaBytes disk >10,000 users >170 communities >500,000 jobs/day CERN, IT Department Number of jobs from 2004 to 2009 Rapid growth of the infrastructure Global Multi Science Infrastructure mission critical for many communities

Grid Activity – delivered CPU 26 June Distribution of work across Tier0/Tier1/Tier 2 really illustrates the importance of the grid system – Tier 2 contribution is ~ 50%; – >85% is external to CERN Tier 2 sites Tier 0 + Tier 1 sites 33 Million

Markus Schulz Data Transfers 39 CERN to T1s

Markus Schulz, CERN40 EGEE Achievements - Applications >270 VOs from several scientific domains – Astronomy & Astrophysics – Civil Protection – Computational Chemistry – Comp. Fluid Dynamics – Computer Science/Tools – Condensed Matter Physics – Earth Sciences – Fusion – High Energy Physics – Life Sciences Further applications under evaluation Applications have moved from testing to routine and daily usage

EGEE Registered Collaborating Projects Applications improved services for academia, industry and the public Support Actions key complementary functions Infrastructures geographical or thematic coverage 25 projects have registered as of September 2007: web pageweb page Markus Schulz, CERN, IT Department

Markus Schulz Future 42

European e-Infrastructure European Data Grid (EDG) – Explore concepts in a testbed Enabling Grid for E-sciencE (EGEE) – Moving from prototype to production European Grid Infrastructure (EGI) – Routine usage of a sustainable e-infrastructure 43

2. EGI_DS Review 44 NGIs in Europe

What is an e-Infrastructure? Resources linked by high speed networks – Compute, Storage, Instruments,... Controlled access to shared resources – Authentication, Authorisation, Accounting,... Dependable services for others to use – Driven by availability and reliability metrics Services that are there for the long-term – Supporting experiments lasting decades Swiss Grid Day 45

Moving from EGEE to EGI What is different? – EGEE did ‘everything’ – EGI focuses just on infrastructure operations What is the same? – Running a 24/7 production quality infrastructure – Providing a support framework for the users – Collaboration to drive European DCI forward Swiss Grid Day 46

Markus Schulz What other changes? Virtualization (fabric) Integration with Clouds – Commercial – Community T0-T1-T2 connection reaches a new quality – Less hierarchical system Addressing Interoperability The end of Moore’s law? …….. 47

WLCG delivers the computing resources needed by the experiments The infrastructure grew exponentially for several years – While reducing support effort and improving reliability The main challenges are: – Scaling – Changes in technology – Organizational changes Markus Schulz Summary 48