Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.

Slides:



Advertisements
Similar presentations
Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
T1 at LBL/NERSC/OAK RIDGE General principles. RAW data flow T0 disk buffer DAQ & HLT CERN Tape AliEn FC Raw data Condition & Calibration & data DB disk.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
ATLAS computing in Geneva Szymon Gadomski, NDGF meeting, September 2009 S. Gadomski, ”ATLAS computing in Geneva", NDGF, Sept 091 the Geneva ATLAS Tier-3.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Statistics of CAF usage, Interaction with the GRID Marco MEONI CERN - Offline Week –
New CERN CAF facility: parameters, usage statistics, user support Marco MEONI Jan Fiete GROSSE-OETRINGHAUS CERN - Offline Week –
Hall D Online Data Acquisition CEBAF provides us with a tremendous scientific opportunity for understanding one of the fundamental forces of nature. 75.
ITEP participation in the EGEE project NEC’2005, Varna, Bulgaria Ivan Korolko (ITEP Moscow)
Zhiling Chen (IPP-ETHZ) Doktorandenseminar June, 4 th, 2009.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
Preparation of KIPT (Kharkov) computing facilities for CMS data analysis L. Levchuk Kharkov Institute of Physics and Technology (KIPT), Kharkov, Ukraine.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
12th November 2003LHCb Software Week1 UK Computing Glenn Patrick Rutherford Appleton Laboratory.
LHCb computing in Russia Ivan Korolko (ITEP Moscow) Russia-CERN JWGC, October 2005.
ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
Status of the LHCb MC production system Andrei Tsaregorodtsev, CPPM, Marseille DataGRID France workshop, Marseille, 24 September 2002.
1 PRAGUE site report. 2 Overview Supported HEP experiments and staff Hardware on Prague farms Statistics about running LHC experiment’s DC Experience.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CERN - IT Department CH-1211 Genève 23 Switzerland t Oracle Real Application Clusters (RAC) Techniques for implementing & running robust.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Status report of the KLOE offline G. Venanzoni – LNF LNF Scientific Committee Frascati, 9 November 2004.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes.
ITEP participation in the EGEE project NEC’2007, Varna, Bulgaria Ivan Korolko (ITEP Moscow)
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Materials for Report about Computing Jiří Chudoba x.y.2006 Institute of Physics, Prague.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
David Stickland CMS Core Software and Computing
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
Dominique Boutigny December 12, 2006 CC-IN2P3 a Tier-1 for W-LCG 1 st Chinese – French Workshop on LHC Physics and associated Grid Computing IHEP - Beijing.
Pledged and delivered resources to ALICE Grid computing in Germany Kilian Schwarz GSI Darmstadt ALICE Offline Week.
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Monthly video-conference, 18/12/2003 P.Hristov1 Preparation for physics data challenge'04 P.Hristov Alice monthly off-line video-conference December 18,
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
Overview of the Belle II computing
LHCb computing in Russia
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
Readiness of ATLAS Computing - A personal view
RDIG for ALICE today and in future
Dagmar Adamova (NPI AS CR Prague/Rez) and Maarten Litmaath (CERN)
AliRoot status and PDC’04
Simulation use cases for T2 in ALICE
Bernd Panzer-Steindel CERN/IT
R. Graciani for LHCb Mumbay, Feb 2006
ATLAS DC2 & Continuous production
Production Manager Tools (New Architecture)
The LHCb Computing Data Challenge DC06
Presentation transcript:

Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow

Research objectives Plans: Large scale data flow simulation in local and GRID environment. Done: 1.Data flow optimization in realistic DC environment ALICE and LHCb MC production 2.Simulation of intensive data flow during data analysis (CMS-like jobs)

main components ITEP LHC computer farm (1) 64  Pentium IV PC modules ( ) A. Selivanov (ITEP-ALICE) a head of the ITEP-LHC farm

BATCH nodes ITEP LHC computer farm (2) CPU:64 PIV-2.4GHz (hyperthreading) RAM:1 GB Disks:80 GB Mass storage 18 TB disk space on Gbit/s network 100 Mbit/s CERN ~622 Mbit/s 32 (LCG) + 32 (PBS)

ITEP LHC FARM since LHC experiments are using ITEP facilities permanently till now we were mainly producing MC samples ITEP view from GOC Accounting Services

ALICE and LHCb DC (2004) ALICE Determine readiness of the off-line framework for data processing Validate the distributed computing model 10% test of the final capacity physics: hard probes (jets, heavy flavours) & pp physics LHCb Studies of high level triggers S/B studies, consolidate background estimates, background properties Robustness test of the LHCb software and production system Test of the LHCb distributed computing model Massive MC production ( ) M events in 3 months

ALICE and LHCb DC (2004) 1 job – 1 event Raw event size: 2 GB ESD size: MB CPU time: 5-20 hours RAM usage: huge Store local copies Backup sent to CERN ALICE - AliEn Massive data exchange with local disk servers 1 job – 500 events Raw event size: ~1.3 MB DST size: MB CPU time: hours RAM usage: moderate Store local copies of DSTs DSTs and LOGs sent to CERN LHCb - DIRAC Often communication with central services

Optimization April – start massive LHCb DC 1 job/CPU – everything OK use hyperthreading - 2jobs/CPU - increase efficiency by 30-40% May – start massive ALICE DC bad interference with LHCb jobs often crashes of NFS restrict ALICE queue to 10 simultaneous jobs, optimize communication with disk server June – Septembersmooth running share resources, LHCb - June July,ALICE – August September careful online monitoring of jobs (on top of usual monitoring from collaboration)

Monitoring Often power cuts in summer (4-5 times)-5% all intermediate steps are lost (…) provide reserve power line and more powerful UPS Stalled jobs-10% infinite loops in GEANT4 (LHCb) crashes of central services write simple check script and kill such jobs (bug report is not sent…) Slow data transfer to CERN poor and restricted link to CERN problems with CASTOR automatic retry

DC Summary Quite visible participation in ALICE and LHCb DCs ALICE → ~5% contribution (ITEP part ~70%) LHCb → ~5% contribution (ITEP part ~70%) With only 44 CPUs Problems reported to colleagues in collaborations Today MC production is a routine task running on LCG (LCG efficiency is still rather low)

Data Analysis Distributed analysis – very different pattern of work load event size: 300 kB CPU time: 0.25 kSI2k/event CMS event size: 75 kB CPU time: 0.3 kSI2k/event LHCb Modern CPUs are ~1 kSI2k  4 events/sec. In 2 years from now 2-3 kSI2k  8-12 events/sec. Data reading rate ~ 3MB/sec Many (up to 100) jobs running in parallel Should we expect serious degradation of cluster performance during simultaneous data analysis by all LHC experiments ?

Simulation of data analysis CMS-like job analyses 1000 events in 100 seconds DST files are stored on a single file server Smoothly increase number of parallel jobs, measuring DST reading time increase number of allowed nfs daemons (8 – default value)

Simulation of data analysis simultaneous jobs getting data from single file server are running without significant degradation of performance Further increase of jobs number is dangerous Full load of cluster with analysis jobs decreases the efficiency of CPU usage by a factor of 2 (32 CPUs only…) file server load

Summary To analyze LHC data (in 2 years from now) we have to improve our clusters considerably: use faster disks for data storage (currently 70 MB/s) use 10 Gbit network for file servers distribute data over many file servers optimize structure of cluster