The LHC Computing Challenge

Slides:



Advertisements
Similar presentations
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Advertisements

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Jerzy Szwed Under Secretary of State Ministry of Science and Higher.
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.
INFSO-RI Enabling Grids for E-sciencE The Grid Challenges in LHC Service Deployment Patricia Méndez Lorenzo CERN (IT-GD) Linköping.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
1 Developing Countries Access to Scientific Knowledge Ian Willers CERN, Switzerland.
Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.
Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.
Les Les Robertson LCG Project Leader LCG - The Worldwide LHC Computing Grid LHC Data Analysis Challenges for 100 Computing Centres in 20 Countries HEPiX.
25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.
Particle Physics and the Grid Randall Sobie Institute of Particle Physics University of Victoria Motivation Computing challenge LHC Grid Canadian requirements.
Z. Z. Vilakazi iThemba LABS / UCT-CERN Research Centre Curation in Natural sciences.
Frédéric Hemmer, CERN, IT DepartmentThe LHC Computing Grid – October 2006 LHC Computing and Grids Frédéric Hemmer IT Deputy Department Head October 10,
José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.
Frédéric Hemmer, CERN, IT Department The LHC Computing Grid – June 2006 The LHC Computing Grid Visit of the Comité d’avis pour les questions Scientifiques.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
INFSO-RI Enabling Grids for E-sciencE Geant4 Physics Validation: Use of the GRID Resources Patricia Mendez Lorenzo CERN (IT-GD)
The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.
Ian Bird LCG Deployment Manager EGEE Operations Manager LCG - The Worldwide LHC Computing Grid Building a Service for LHC Data Analysis 22 September 2006.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
1 The LHC Computing Grid – February 2007 Frédéric Hemmer, CERN, IT Department LHC Computing and Grids Frédéric Hemmer Deputy IT Department Head January.
CERN IT Department CH-1211 Genève 23 Switzerland Visit of Professor Karel van der Toorn President University of Amsterdam Wednesday 10 th.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.
INFSO-RI Enabling Grids for E-sciencE Porting Scientific Applications on GRID: CERN Experience Patricia Méndez Lorenzo CERN (IT-PSS/ED)
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.
Service, Operations and Support Infrastructures in HEP Processing the Data from the World’s Largest Scientific Machine Patricia Méndez Lorenzo (IT-GS/EIS),
SC4 Planning Planning for the Initial LCG Service September 2005.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
LHC Computing, CERN, & Federated Identities
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
The LHC Computing Grid Visit of Dr. John Marburger
1 The LHC Computing Grid – April 2007 Frédéric Hemmer, CERN, IT Department The LHC Computing Grid A World-Wide Computer Centre Frédéric Hemmer Deputy IT.
tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
“Replica Management in LCG”
WLCG Tier-2 Asia Workshop TIFR, Mumbai 1-3 December 2006
Ian Bird WLCG Workshop San Francisco, 8th October 2016
Grid Computing in HIGH ENERGY Physics
Physics Data Management at CERN
Kors Bos NIKHEF, Amsterdam.
IT Department and The LHC Computing Grid
The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez
LHC experiments Requirements and Concepts ALICE
Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.
Experiences and Outlook Data Preservation and Long Term Analysis
Dagmar Adamova, NPI AS CR Prague/Rez
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
UK GridPP Tier-1/A Centre at CLRC
Readiness of ATLAS Computing - A personal view
The LHC Computing Grid Visit of Her Royal Highness
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
Visit of US House of Representatives Committee on Appropriations
LHC Data Analysis using a worldwide computing grid
The Worldwide LHC Computing Grid
Using an Object Oriented Database to Store BaBar's Terabytes
The LHC Computing Grid Visit of Prof. Friedrich Wagner
Overview & Status Al-Ain, UAE November 2007.
The LHC Computing Grid Visit of Professor Andreas Demetriou
The LHCb Computing Data Challenge DC06
Presentation transcript:

The LHC Computing Challenge Tim Bell Fabric Infrastructure & Operations Group Information Technology Department CERN 2nd April 2009

The Four LHC Experiments… ATLAS General purpose Origin of mass Supersymmetry 2,000 scientists from 34 countries CMS General purpose Origin of mass Supersymmetry 1,800 scientists from over 150 institutes ALICE heavy ion collisions, to create quark-gluon plasmas 50,000 particles in each collision LHCb to study the differences between matter and antimatter will detect over 100 million b and b-bar mesons each year

… generate lots of data … The accelerator generates 40 million particle collisions (events) every second at the centre of each of the four experiments’ detectors

… generate lots of data … reduced by online computers to a few hundred “good” events per second. Which are recorded on disk and magnetic tape at 100-1,000 MegaBytes/sec ~15 PetaBytes per year for all four experiments

(extracted by physics topic) Data Handling and Computation for Physics Analysis CERN reconstruction detector event filter (selection & reconstruction) analysis processed data event summary data raw data batch physics analysis event reprocessing simulation analysis objects (extracted by physics topic) event simulation interactive physics analysis

… leading to a high box count ~2,500 PCs Another ~1,500 boxes CPU Disk Tape

Computing Service Hierarchy Tier-0 – the accelerator centre Data acquisition & initial processing Long-term data curation Distribution of data  Tier-1 centres Canada – Triumf (Vancouver) France – IN2P3 (Lyon) Germany – Forschunszentrum Karlsruhe Italy – CNAF (Bologna) Netherlands – NIKHEF/SARA (Amsterdam) Nordic countries – distributed Tier-1 Spain – PIC (Barcelona) Taiwan – Academia SInica (Taipei) UK – CLRC (Oxford) US – FermiLab (Illinois) – Brookhaven (NY) Tier-1 – “online” to the data acquisition process  high availability Managed Mass Storage Data-heavy analysis National, regional support Tier-2 – ~100 centres in ~40 countries Simulation End-user analysis – batch and interactive

The Grid Timely Technology! Deploy to meet LHC computing needs. Challenges for the Worldwide LHC Computing Grid Project due to worldwide nature competing middleware… newness of technology scale …

Interoperability in action

83 Tier-2 sites being monitored Reliability Site Reliability Tier-2 Sites 83 Tier-2 sites being monitored

Why Linux ? 1990s – Unix wars – 6 different Unix flavours Linux allowed all users to align behind a single OS which was low cost and dynamic Scientific Linux is based on Red Hat with extensions of key usability and performance features AFS global file system XFS high performance file system But how to deploy without proprietary tools? See EDG/WP4 report on current technology (http://cern.ch/hep-proj-grid-fabric/Tools/DataGrid-04-TED-0101-3_0.pdf) or “Framework for Managing Grid-enabled Large Scale Computing Fabrics” (http:/cern.ch/quattor/documentation/poznanski-phd.pdf) for reviews of various packages.

Deployment Commercial Management Suites Scalability (Full) Linux support rare (5+ years ago…) Much work needed to deal with specialist HEP applications; insufficient reduction in staff costs to justify license fees. Scalability 5,000+ machines to be reconfigured 1,000+ new machines per year Configuration change rate of 100s per day See EDG/WP4 report on current technology (http://cern.ch/hep-proj-grid-fabric/Tools/DataGrid-04-TED-0101-3_0.pdf) or “Framework for Managing Grid-enabled Large Scale Computing Fabrics” (http:/cern.ch/quattor/documentation/poznanski-phd.pdf) for reviews of various packages.

Dataflows and rates Remember this figure Scheduled work only! 700MB/s 420MB/s 700MB/s 1120MB/s (1600MB/s) (2000MB/s) Averages! Need to be able to support 2x for recovery! Remember this figure 1430MB/s

Volumes & Rates 15PB/year. Peak rate to tape >2GB/s 3 full SL8500 robots/year Requirement in first 5 years to reread all past data between runs 60PB in 4 months: 6GB/s Can run drives at sustained 80MB/s 75 drives flat out merely for controlled access Data Volume has interesting impact on choice of technology Media use is advantageous: high-end technology (3592, T10K) favoured over LTO.

Tape archive subsystem Castor Architecture DB Svc Job Qry Error Stager RH Client Scheduler Disk cache subsystem RR DB Disk Servers Mover GC StagerJob MigHunter Central Services NameServer RTCPClientD Tape archive subsystem Tape Servers TapeDaemon VMGR RTCPD VDQM Detailed view

Castor Performance

Long lifetime LEP, CERN’s last accelerator, started in 1989 and shutdown 10 years later. First data recorded to IBM 3480s; at least 4 different technologies used over the period. All data ever taken, right back to 1989, was reprocessed and reanalysed in 2001/2. LHC starts in 2007 and will run until at least 2020. What technologies will be in use in 2022 for the final LHC reprocessing and reanalysis? Data repacking required every 2-3 years. Time consuming Data integrity must be maintained

Disk capacity & I/O rates 1996 2000 2006 4GB 10MB/s 50GB 20MB/s 500GB 60MB/s 1TB  I/O 250x10MB/s 2,500MB/s 20x20MB/s 400MB/s  2x60MB/s 120MB/s CERN now purchases two different storage server models: capacity oriented and throughput oriented. fragmentation increases management complexity (purchase overhead also increased…)

.. and backup – TSM on Linux Daily Backup volumes of around 18TB to 10 Linux TSM servers

Capacity Requirements

Power Outlook

Summary Immense Challenges & Complexity Data rates, developing software, lack of standards, worldwide collaboration, … Considerable Progress in last ~5-6 years WLCG service exists Petabytes of data transferred But more data is coming in November… Will the system cope with chaotic analysis? Will we understand the system enough to identify problems—and fix underlying causes ? Can we meet requirements given power available?