Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.

Slides:



Advertisements
Similar presentations
 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.
Advertisements

CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Jerzy Szwed Under Secretary of State Ministry of Science and Higher.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
INFSO-RI Enabling Grids for E-sciencE The Grid Challenges in LHC Service Deployment Patricia Méndez Lorenzo CERN (IT-GD) Linköping.
Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
LHCC Comprehensive Review – September WLCG Commissioning Schedule Still an ambitious programme ahead Still an ambitious programme ahead Timely testing.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Frédéric Hemmer, CERN, IT Department The LHC Computing Grid – June 2006 The LHC Computing Grid Visit of the Comité d’avis pour les questions Scientifiques.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
INFSO-RI Enabling Grids for E-sciencE Geant4 Physics Validation: Use of the GRID Resources Patricia Mendez Lorenzo CERN (IT-GD)
Grid Applications for High Energy Physics and Interoperability Dominique Boutigny CC-IN2P3 June 24, 2006 Centre de Calcul de l’IN2P3 et du DAPNIA.
A short introduction to the Worldwide LHC Computing Grid Maarten Litmaath (CERN)
LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
© 2006 Open Grid Forum Enabling Pervasive Grids The OGF GIN Effort Erwin Laure GIN-CG co-chair, EGEE Technical Director
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
GridPP Deployment & Operations GridPP has built a Computing Grid of more than 5,000 CPUs, with equipment based at many of the particle physics centres.
1 The LHC Computing Grid – February 2007 Frédéric Hemmer, CERN, IT Department LHC Computing and Grids Frédéric Hemmer Deputy IT Department Head January.
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Jürgen Knobloch/CERN Slide 1 A Global Computer – the Grid Is Reality by Jürgen Knobloch October 31, 2007.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
INFSO-RI Enabling Grids for E-sciencE Porting Scientific Applications on GRID: CERN Experience Patricia Méndez Lorenzo CERN (IT-PSS/ED)
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Les Les Robertson LCG Project Leader WLCG – Management Overview LHCC Comprehensive Review September 2006.
US LHC OSG Technology Roadmap May 4-5th, 2005 Welcome. Thank you to Deirdre for the arrangements.
Service, Operations and Support Infrastructures in HEP Processing the Data from the World’s Largest Scientific Machine Patricia Méndez Lorenzo (IT-GS/EIS),
INFSO-RI Enabling Grids for E-sciencE The gLite File Transfer Service: Middleware Lessons Learned form Service Challenges Paolo.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – September 2007 Wolfgang von Rüden, CERN, IT Department The LHC Computing Grid Frédéric Hemmer.
Ian Bird LCG Project Leader WLCG Update 6 th May, 2008 HEPiX – Spring 2008 CERN.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks CRAB: the CMS tool to allow data analysis.
LHC Computing, CERN, & Federated Identities
LCG Accounting Update John Gordon, CCLRC-RAL WLCG Workshop, CERN 24/1/2007 LCG.
The LHC Computing Grid Visit of Dr. John Marburger
1 The LHC Computing Grid – April 2007 Frédéric Hemmer, CERN, IT Department The LHC Computing Grid A World-Wide Computer Centre Frédéric Hemmer Deputy IT.
Victoria, Sept WLCG Collaboration Workshop1 ATLAS Dress Rehersals Kors Bos NIKHEF, Amsterdam.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
The Grid Storage System Deployment Working Group 6 th February 2007 Flavia Donno IT/GD, CERN.
Status of gLite-3.0 deployment and uptake Ian Bird CERN IT LCG-LHCC Referees Meeting 29 th January 2007.
Acronyms GAS - Grid Acronym Soup, LCG - LHC Computing Project EGEE - Enabling Grids for E-sciencE.
Ian Bird LCG Project Leader WLCG Status Report 7 th May, 2008 LHCC Open Session.
Jürgen Knobloch/CERN Slide 1 Grid Computing by Jürgen Knobloch CERN IT-Department Presented at Physics at the Terascale DESY, Hamburg December 4, 2007.
LCG Introduction John Gordon, STFC-RAL GDB June 11 th, 2008.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
ATLAS Computing Model Ghita Rahal CC-IN2P3 Tutorial Atlas CC, Lyon
Dr. Ian Bird LHC Computing Grid Project Leader Göttingen Tier 2 Inauguration 13 th May 2008 Challenges and Opportunities.
The Worldwide LHC Computing Grid WLCG Milestones for 2007 Focus on Q1 / Q2 Collaboration Workshop, January 2007.
1-2 March 2006 P. Capiluppi INFN Tier1 for the LHC Experiments: ALICE, ATLAS, CMS, LHCb.
18/12/03PPD Christmas Lectures 2003 Grid in the Department A Guide for the Uninvolved PPD Computing Group Christmas Lecture 2003 Chris Brew.
“Replica Management in LCG”
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Grid Computing in HIGH ENERGY Physics
LCG Service Challenge: Planning and Milestones
Physics Data Management at CERN
Kors Bos NIKHEF, Amsterdam.
IT Department and The LHC Computing Grid
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Data Challenge with the Grid in ATLAS
The LHC Computing Challenge
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Visit of US House of Representatives Committee on Appropriations
LHC Data Analysis using a worldwide computing grid
The LHC Computing Grid Visit of Prof. Friedrich Wagner
Overview & Status Al-Ain, UAE November 2007.
The LHCb Computing Data Challenge DC06
Presentation transcript:

Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September 2007

CHEP 2007 LCG 2 WLCG – The Collaboration – 4 Experiments + Tier-0 – the accelerator centre  Data acquisition & initial processing  Long-term data curation  Distribution of data  Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) 11 Tier-1 Centres – “online” to the data acquisition process  high availability  Managed Mass Storage –  grid-enabled data service  Data-heavy analysis  National, regional support Tier-2 – 112 Centres in 53 Federations in 26 countries  End-user (physicist, research group) analysis – where the discoveries are made  Simulation

CHEP 2007 LCG WLCG depends on two major science grid infrastructures …. EGEE - Enabling Grids for E-Science OSG - US Open Science Grid

CHEP 2007 LCG The Worldwide LHC Computing Grid Does it work?

CHEP 2007 LCG CPU Usage accounted to LHC Experiments accounted/pledged over past 12 months : 62% 48% Ramp-up needed over next 8 months : 6 X 4 X

CHEP 2007 LCG CPU Usage accounted to LHC Experiments July Tier-2s 45% 11 Tier-1s 35% CERN 20% 530M SI2K-days/month (CPU) 9 PB disk at CERN + Tier-1s

CHEP 2007 LCG Sites reporting to the GOC repository at RAL ATLAS+CMS Targets (jobs/month) End 2007 Mid M 9M

CMS Dashboard - Crab Analysis Jobs Top 20 of 88 sites running at least one job Mid-July  mid-August 2007 – 645K jobs (20K jobs/day) – 89% grid success rate

CHEP 2007 LCG 2007 – CERN  Tier-1 Data Distribution Jan Feb Mar Apr May Jun Jul Aug Average data rate per day by experiment (Mbytes/sec) Need a factor of 2-3 when the accelerator is running (achieved last year for basic file transfers – but this year tests are under more realistic experiment conditions)

CHEP 2007 LCG all sites  all sites Overall within 50% of the 2008 target but not every site is reliable and taking its share CSA06

Baseline Services Storage Element – Castor, dCache, DPM (with SRM 1.1) – Storm added in 2007 – SRM 2.2 – spec. agreed in May being deployed now Basic transfer tools – Gridftp,.. File Transfer Service (FTS) LCG File Catalog (LFC) LCG data mgt tools - lcg-utils Posix I/O – – Grid File Access Library (GFAL) Synchronised databases T0  T1s – 3D project Information System Compute Elements – Globus/Condor-C – web services (CREAM) gLite Workload Management – in production at CERN VO Management System (VOMS) VO Boxes Application software installation Job Monitoring Tools The Basic Baseline Services – from the TDR (2005)... continuing evolution reliability, performance, functionality, requirements

Victoria, Sept WLCG Collaboration WorkshopKors Bos 12 M4 data taking August 31 Throughput MB/s Data transferred GB Completed filetransfers Total number of errors From the detector to analysis at Tier2s

CHEP 2007 LCG Reliability? SAM “critical” system tests Site Reliability Tier-2 Sites 83 Tier-2 sites being monitored

CHEP 2007 LCG User Job Efficiency  Job success rate – excluding application errors  Measured by job log analysis  At present only for jobs submitted via the EGEE workload management system

CHEP 2007 LCG Reliability  Operational complexity is now the weakest link  Inconsistent error reporting -- confused by many layers of software – local system, grid middleware, application layers  Communications difficult – -- sites supporting several (not just LHC) experiments and sometimes other sciences -- experiments coordinating across a large number of sites -- multiple sites, services implicated in difficult data problems  Sites have different histories, different procedures, different priorities  A major effort now on monitoring**  Integrating grid monitoring with site operations  Experiment specific dashboards for experiment operations and end-users.. and on standard metrics – comparing sites, experiments **Session on monitoring - Grid Middleware and Tools – Wednesday afternoon

Reminder – one of the conclusions from the plenary talk at CHEP’04 by Fabiola Gianotti

LCG Are we approaching the Plateau of Productivity? Gartner Group HEP Grid on the CHEP timeline Padova Beijing San Diego Interlaken Mumbai Victoria?

CHEP 2007 LCG Middleware & services:  Initial goals over-ambitious – but we now have basic functionality, tools, services  SRM 2.2 is late – and storage management is hard  Experiments have to live with the functionality that we have now Usage:  Experiments are running large numbers of jobs – despite their (justified) complaints  And transferring large amounts of data – though not always to where they want it  ATLAS has taken cosmic data from the detector to analysis at Tier-2s  End-users beginning to run analysis jobs – but sites need to understand much better how analysis will be done during the first couple of years  and what the implications are for data

CHEP 2007 LCG Scalability:  5-6 X needed for resource capacity, number of jobs  2-3 X needed for data transfer  Live for now with the functionality we have  Need to understand better how analysis will be done Reliability:  Not yet good enough  Data Transfer is still the most worrying - despite many years of planning and testing  Many errors  complicated recovery procedures  Many sources of error – storage systems, site operations, experiment data management systems, databases, grid middleware and services, networks,.... Hard to get to the roots of the problems

CHEP 2007 LCG Are we getting there? Slowly! Need continuous testing from now until first beams  Driven by experiments with realistic scenarios, good monitoring and measurements  and the pro-active participation of sites, developers, storage experts After so many years --- the beams are now on the horizon & we can all focus on the contribution that we can make to extracting the physics