PIC the Spanish LHC Tier-1 ready for data taking EGEE09, Barcelona 21-Sep-2008 Gonzalo Merino,

Slides:

Advertisements

Similar presentations

Computing for LHC Dr. Wolfgang von Rüden, CERN, Geneva ISEF students visit CERN, 28 th June - 1 st July 2009.

Advertisements

 Contributing >30% of throughput to ATLAS and CMS in Worldwide LHC Computing Grid  Reliant on production and advanced networking from ESNET, LHCNET and.

T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.

23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 15 th April 2009 Visit of Spanish Royal Academy.

Les Les Robertson WLCG Project Leader WLCG – Worldwide LHC Computing Grid Where we are now & the Challenges of Real Data CHEP 2007 Victoria BC 3 September.

Ian M. Fisk Fermilab February 23, Global Schedule External Items ➨ gLite 3.0 is released for pre-production in mid-April ➨ gLite 3.0 is rolled onto.

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Assessment of Core Services provided to USLHC by OSG.

Stefano Belforte INFN Trieste 1 CMS SC4 etc. July 5, 2006 CMS Service Challenge 4 and beyond.

Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.

December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.

Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.

José M. Hernández CIEMAT Grid Computing in the Experiment at LHC Jornada de usuarios de Infraestructuras Grid January 2012, CIEMAT, Madrid.

Experience with the WLCG Computing Grid 10 June 2010 Ian Fisk.

Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.

Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.

Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)

1 Kittikul Kovitanggoon*, Burin Asavapibhop, Narumon Suwonjandee, Gurpreet Singh Chulalongkorn University, Thailand July 23, 2015 Workshop on e-Science.

LCG Service Challenge Phase 4: Piano di attività e impatto sulla infrastruttura di rete 1 Service Challenge Phase 4: Piano di attività e impatto sulla.

The LHC Computing Grid – February 2008 The Worldwide LHC Computing Grid Dr Ian Bird LCG Project Leader 25 th April 2012.

BESIII Production with Distributed Computing Xiaomei Zhang, Tian Yan, Xianghu Zhao Institute of High Energy Physics, Chinese Academy of Sciences, Beijing.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

November SC06 Tampa F.Fanzago CRAB a user-friendly tool for CMS distributed analysis Federica Fanzago INFN-PADOVA for CRAB team.

Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.

Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.

The LHC Computing Grid – February 2008 The Challenges of LHC Computing Dr Ian Bird LCG Project Leader 6 th October 2009 Telecom 2009 Youth Forum.

Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.

ESFRI & e-Infrastructure Collaborations, EGEE’09 Krzysztof Wrona September 21 st, 2009 European XFEL.

CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.

WLCG and the India-CERN Collaboration David Collados CERN - Information technology 27 February 2014.

Slide David Britton, University of Glasgow IET, Oct 09 1 Prof. David Britton GridPP Project leader University of Glasgow UK-T0 Meeting 21 st Oct 2015 GridPP.

Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.

Plans for Service Challenge 3 Ian Bird LHCC Referees Meeting 27 th June 2005.

Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)

MND review. Main directions of work  Development and support of the Experiment Dashboard Applications - Data management monitoring - Job processing monitoring.

Predrag Buncic ALICE Status Report LHCC Referee Meeting CERN

tons, 150 million sensors generating data 40 millions times per second producing 1 petabyte per second The ATLAS experiment.

Ian Bird WLCG Networking workshop CERN, 10 th February February 2014

1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.

The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.

Computing Model José M. Hernández CIEMAT, Madrid On behalf of the CMS Collaboration XV International Conference on Computing in High Energy and Nuclear.

WLCG Status Report Ian Bird Austrian Tier 2 Workshop 22 nd June, 2010.

Grid and Data handling Gonzalo Merino, Port d’Informació Científica / CIEMAT Primeras Jornadas del CPAN, El Escorial, 25/11/2009.

Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.

Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.

IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.

Status of GSDC, KISTI Sang-Un Ahn, for the GSDC Tier-1 Team

1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.

Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.

Top 5 Experiment Issues ExperimentALICEATLASCMSLHCb Issue #1xrootd- CASTOR2 functionality & performance Data Access from T1 MSS Issue.

Grid technologies for large-scale projects N. S. Astakhov, A. S. Baginyan, S. D. Belov, A. G. Dolbilov, A. O. Golunov, I. N. Gorbunov, N. I. Gromova, I.

Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.

Real-life usage of the EGEE infrastructure by LHC experiments G. Merino EGEE-II Grid Tutorial for Users and System Administrators, Barcelona 16-Apr-2008.

LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.

LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.

Ian Bird WLCG Workshop San Francisco, 8th October 2016

The LHC Computing Grid Visit of Mtro. Enrique Agüera Ibañez

Jan 12, 2005 Improving CMS data transfers among its distributed Computing Facilities N. Magini CERN IT-ES-VOS, Geneva, Switzerland J. Flix Port d'Informació.

Update on Plan for KISTI-GSDC

LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.

The LHC Computing Grid Visit of Her Royal Highness

Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)

Ákos Frohner EGEE'08 September 2008

ALICE Computing Model in Run3

ALICE Computing Upgrade Predrag Buncic

LHC Data Analysis using a worldwide computing grid

The LHC Computing Grid Visit of Professor Andreas Demetriou

The LHCb Computing Data Challenge DC06

Presentation transcript:

PIC the Spanish LHC Tier-1 ready for data taking EGEE09, Barcelona 21-Sep-2008 Gonzalo Merino,

The LHC Basic Science Try to answer fundamental questions: What are things made of? How did it look like 1 nanosecond after Big Bang? World’s largest scientific machine 27 km ring proton accelerators 100 m underground superconducting magnets -271 o C

Four detectors located in the p-p collision points Extremely complex devices O(1000) people collaborations >100 million sensors Generate Petabytes/s data

4 WLCG The LHC Computing Grid project started in 2002 –Phase 1 (2002 – 2005): Tests and dev, build service prototype –Phase 2 (2006 – 2008): Deploy initial LHC computing service Purpose: “to provide the computing resources needed to process and analyse the data gathered by the LHC Experiments” Enormous data volumes and computing capacity –15 PB/yr RAW  >50 PB/year overall –Lifetime years: Exabyte scale Scalability challenge Clear that a distributed infrastructure was needed

5 Grid Infrastructure Since early 2000s large international projects funded to deploy production Grid Infrastructures for scientific research Luckily for WLCG, these took a big load building the infrastructure WLCG built on big multi-science production Grid Infrastructures: EGEE, OSG

6 LHC a big EGEE user Monthly CPU walltime usage per scientific discipline from EGEE Accounting Portal. The LHC: the biggest user of EGEE. There is many ways one can use EGEE: How is the LHC using the Grid?

7 Tiered Structure It comes from the early days (1998, MONARC). Then mainly motivated for limited network connectivity among sites. Today, the network is not the issue but the Tiered model is still used to organise work and data flows. Tier-0 at CERN: –DAQ and prompt reconstruction. –Long term data curation. Tier-1 (11 centres): Online to DAQ  24x7 –Massive data reconstruction. –Long term storage of RAW data copy. Tier-2 (>150 centres): –End-user analysis and simulation. Computing Models: not all 4 LHC experiments use the Tiered structure the same way

8 Distribution of resources Experiment computing requirements for the run at the different WLCG Tiers More than 80% of the resources are outside CERN The Grid MUST work

The operations challenge Scalability Reliability Performance

10 Scalability The computing and storage capacity needs for WLCG are enormous. Once LHC starts, the growth rate will be impressive. ~ today cores

11 Reliability Setting up and deploying a robust Operational model is crucial for building reliable services on the Grid. One of the key tools for WLCG comes from EGEE: The Service Availability Monitor

12 Improving reliability Track site status with time…

13 Improving reliability... publish rankings

14 Improving reliability See how sites reliability goes up! An increasing number of more realistic sensors, plus a powerful monitoring framework that ensures peer pressure, guarantees that reliability of WLCG service will keep improving.

15 Performance: data volumes CMS has been transferring 100 – 200 TB per day on the Grid since more than 2 years TB/day Last June ATLAS added 4 PB in 11 days to their total of 12 PB on the Grid

16 Port d’Informació Científica (PIC) created in June partners collaboration agreement: DEiU, CIEMAT, UAB, IFAE Data centre supporting scientific research involving analysis of massive sets of distributed data.

17 WLCG Tier-1 since Dec 2003 supporting ATLAS, CMS and LHCb –Targeting to provide 5% of the Tier-1s capacity The Tier-1 represents >80% of current resources Goal: Technology transfer from the LHC Tier-1 to other scientific areas facing similar data challenges. Computing services for other applications besides LHC –Astroparticles (MAGIC datacenter) –Cosmology (DES, PAU) –Medical Imaging (Neuroradiology)

is PIC delivering as WLCG Tier-1? Scalability Reliability Performance

19 Scalability In the last 3 years PIC has followed the capacity ramp-ups as pledged in the WLCG MoU 5 fold in CPU >10 fold in Disk and Tape Planned CPU cluster: Torque/Maui Disk: dCacheTape Castor  Enstore migration Sun 4500 servers (DAS) DDN S2A (SAN) Public tenders purchase model  different technologies  integration challenge

20 PIC Tier-1 Reliability Tier-1 reliability targets have been met for most of the months

21 Performance: T0/T1 transfers Target ATLAS+CMS+LHCb ~ 210 MB/s CMS data imported from T1s CMS data exported to T1s Target ATLAS+CMS+LHCb ~ 100 MB/s ATLAS daily rate CERN  PIC June 2009 Target: 76 MB/s CMS daily rate CERN  PIC June 2009 Target: 60 MB/s Data import from CERN and transfers with other Tier-1s successfully tested above targets

22 Challenges ahead Tape is difficult (specially reading …) –Basic use cases have been tested in June, but still tricky Organisation of files in tape groups – big impact in performance Sharing of drives/libraries between experiments Access to data from jobs –Support very different jobs: reco (1GB/2h)  user (5GB/10min) –Optimisation for all of them is not possible – compromise Remote open of files vs. copy to the local disk Read_ahead buffer tunning, disk contention in the WNs Real users (chasing real data) –Simulations of “user analysis” load have been done – Ok –The system has never been tested with 1000s of real users simultaneously accessing data with realistic (random?) patterns

23 Challenges ahead The Cloud –A new buzzword? –Several demos of running scientific apps on commercial clouds. –Sure we can learn a lot from the associated technologies. Encapsulating jobs as VMs – decouple hardware from applications Multi-science support environments –Many of the WLCG sites also support other EGEE VOs –Requirements can be very different (access control to data, …) –Maintaining high QoS in these heterogeneous environments will be a challenge… but that’s precisely the point.

Thank you Gonzalo Merino,

Backup Slides

26 Data Analysis The original vision: Application thin layer interacting with a powerful m/w layer (super-WMS to which the user throws input dataset queries plus algorithms and spits the result out) Reality today: LHC experiments have build increasingly sophisticated s/w stacks to interact with the Grid. On top of basic services: CE, SE, FTS, LFC Workload Management: Pilot jobs, late scheduling, VO-steered prioritisation (DIRAC, Alien, Panda …) Data Management: Topology aware higher level tools, capable of managing complex data flows (Phedex, DDM …) User Analysis: Single interface for the whole analysis cycle, hide the complexity of the Grid (Ganga, CRAB, DIRAC, Alien …) To use the Grid at such large scale is not an easy business!

27 Performance: CPU usage ksi2k·month ~ simult. busy cores

PIC data centre 150 m 2 machine room 200 KVA IT UPS (+diesel generator) ~ 1500 CPU cores batch farm ~ 1 Petabyte of disk ~ 2 Petabytes of tape STK-5500 and IBM-3584 tape libraries

29 End-to-end Throughput Besides growing capacity, one of the challenges for sites is to stand high throughput data rates between components 1500 cores cluster 1 Petabyte disk 1 Petabyte tape

30 End-to-end Throughput Besides growing capacity, one of the challenges for sites is to stand high throughput data rates between components 1500 cores cluster 1 Petabyte disk 1 Petabyte tape 2,5 GBytes/s peak rates WNs-disk during June Gbps

31 End-to-end Throughput Besides growing capacity, one of the challenges for sites is to stand high throughput data rates between components 1500 cores cluster 1 Petabyte disk 1 Petabyte tape 2,5 GBytes/s peak rates WNs-disk during June-09 >250 MB/s r+w tape bandwidth demonstrated in June MB/s

32 Data Transfers to Tier-2s Reconstructed data sent to the T2s for analysis. Bursty nature. Experiment requirements very fuzzy for this dataflow (as fast as possible) –Links to all SP/PT Tier-2s certified with MB/s sustained –CMS Computing Model: sustained transfers to > 43 T2s worldwide

33 Reliability: experiments view Seeing sites reliability improving, experiments were motivated to making their sensors to measure it more realistic. An increasing number of more realistic sensors, plus a powerful monitoring framework that ensures peer pressure, guarantees that reliability of WLCG service will keep improving.