Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.

Slides:



Advertisements
Similar presentations
31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
Advertisements

ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
CHEP 2012 – New York City 1.  LHC Delivers bunch crossing at 40MHz  LHCb reduces the rate with a two level trigger system: ◦ First Level (L0) – Hardware.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
IPv6 testing plans 25 Jan Short term – next 6 weeks Add sites to testbed – Glasgow (DPM storage end point) – Fix DESY – Others? Is GridFTP mesh.
Stuart K. PatersonCHEP 2006 (13 th –17 th February 2006) Mumbai, India 1 from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication('DaVinci',
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES News on monitoring for CMS distributed computing operations Andrea.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
LHC Experiment Dashboard Main areas covered by the Experiment Dashboard: Data processing monitoring (job monitoring) Data transfer monitoring Site/service.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
Scalability By Alex Huang. Current Status 10k resources managed per management server node Scales out horizontally (must disable stats collector) Real.
Virtual Organization Approach for Running HEP Applications in Grid Environment Łukasz Skitał 1, Łukasz Dutka 1, Renata Słota 2, Krzysztof Korcyl 3, Maciej.
LHCb Quarterly Report October Core Software (Gaudi) m Stable version was ready for 2008 data taking o Gaudi based on latest LCG 55a o Applications.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
CHEP – Mumbai, February 2006 The LCG Service Challenges Focus on SC3 Re-run; Outlook for 2006 Jamie Shiers, LCG Service Manager.
Computing Infrastructure Status. LHCb Computing Status LHCb LHCC mini-review, February The LHCb Computing Model: a reminder m Simulation is using.
Take on messages from Lecture 1 LHC Computing has been well sized to handle the production and analysis needs of LHC (very high data rates and throughputs)
Copyright © 2000 OPNET Technologies, Inc. Title – 1 Distributed Trigger System for the LHC experiments Krzysztof Korcyl ATLAS experiment laboratory H.
Nick Brook Current status Future Collaboration Plans Future UK plans.
Automated Grid Monitoring for LHCb Experiment through HammerCloud Bradley Dice Valentina Mancinelli.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Enabling Grids for E-sciencE EGEE-III INFSO-RI Using DIANE for astrophysics applications Ladislav Hluchy, Viet Tran Institute of Informatics Slovak.
Enabling Grids for E-sciencE System Analysis Working Group and Experiment Dashboard Julia Andreeva CERN Grid Operations Workshop – June, Stockholm.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
CERN Using the SAM framework for the CMS specific tests Andrea Sciabà System Analysis WG Meeting 15 November, 2007.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
ATLAS is a general-purpose particle physics experiment which will study topics including the origin of mass, the processes that allowed an excess of matter.
1 Andrea Sciabà CERN Critical Services and Monitoring - CMS Andrea Sciabà WLCG Service Reliability Workshop 26 – 30 November, 2007.
1 Computer Systems II Introduction to Processes. 2 First Two Major Computer System Evolution Steps Led to the idea of multiprogramming (multiple concurrent.
CHEP 2006, February 2006, Mumbai 1 LHCb use of batch systems A.Tsaregorodtsev, CPPM, Marseille HEPiX 2006, 4 April 2006, Rome.
David Adams ATLAS DIAL: Distributed Interactive Analysis of Large datasets David Adams BNL August 5, 2002 BNL OMEGA talk.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
Tier3 monitoring. Initial issues. Danila Oleynik. Artem Petrosyan. JINR.
Tier-1 Andrew Sansum Deployment Board 12 July 2007.
DIRAC Pilot Jobs A. Casajus, R. Graciani, A. Tsaregorodtsev for the LHCb DIRAC team Pilot Framework and the DIRAC WMS DIRAC Workload Management System.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Status of BESIII Distributed Computing BESIII Workshop, Sep 2014 Xianghu Zhao On Behalf of the BESIII Distributed Computing Group.
LHCb report to LHCC and C-RSG Philippe Charpentier CERN on behalf of LHCb.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
OPERATIONS REPORT JUNE – SEPTEMBER 2015 Stefan Roiser CERN.
DIRAC Project A.Tsaregorodtsev (CPPM) on behalf of the LHCb DIRAC team A Community Grid Solution The DIRAC (Distributed Infrastructure with Remote Agent.
ATLAS Distributed Analysis Dietrich Liko IT/GD. Overview  Some problems trying to analyze Rome data on the grid Basics Metadata Data  Activities AMI.
ATLAS FroNTier cache consistency stress testing David Front Weizmann Institute 1September 2009 ATLASFroNTier chache consistency stress testing.
8 August 2006MB Report on Status and Progress of SC4 activities 1 MB (Snapshot) Report on Status and Progress of SC4 activities A weekly report is gathered.
Jiri Chudoba for the Pierre Auger Collaboration Institute of Physics of the CAS and CESNET.
WMS baseline issues in Atlas Miguel Branco Alessandro De Salvo Outline  The Atlas Production System  WMS baseline issues in Atlas.
Joint Institute for Nuclear Research Synthesis of the simulation and monitoring processes for the data storage and big data processing development in physical.
The RAL PPD Tier 2/3 Current Status and Future Plans or “Are we ready for next year?” Chris Brew PPD Christmas Lectures th December 2007.
Monitoring the Readiness and Utilization of the Distributed CMS Computing Facilities XVIII International Conference on Computing in High Energy and Nuclear.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
DIRAC: Workload Management System Garonne Vincent, Tsaregorodtsev Andrei, Centre de Physique des Particules de Marseille Stockes-rees Ian, University of.
L’analisi in LHCb Angelo Carbone INFN Bologna
U.S. ATLAS Grid Production Experience
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
US CMS Testbed.
LHCb Grid Computing LHCb is a particle physics experiment which will study the subtle differences between matter and antimatter. The international collaboration.
The LHCb Computing Data Challenge DC06
Presentation transcript:

Computing and LHCb Raja Nandakumar

The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation  Study cp-violation  Indirect evidence of new physics  There are many other questions (of course)  The LHCb experiment has been built  Hope to answer some of these questions

The LHCb detector February 2002 Cavern ready for detector installation August 2008

How the data looks

The detector records …  >1 Million channels of data every bunch crossing  25ns between bunch crossings  Trigger reduces to about 2000 events/sec  ~7 Million events / hour  25 KB/s raw event size  4.3 TB/day  Not as much as ATLAS / CMS but still …  Assuming continuous operation  Breaks for fills, etc.  These events will need to be farmed out of CERN  Reconstructed and stripped at Tier-1s  Then replicated to all LHCb Tier-1 sites  Finally available for user analysis

The LHCb computing model CERN Production (T2/T1/T0) Simulation + digitization.digi Reconstruction (T1 / T0).rdst.digi Stripping (T1 / T0).dst.rdst T1 / T0.dst FTS User Analysis (T1/T0)

LHCb job submission  Computing distributed all over the world  Particle physics is collaborative across institutes in various nations  Both cpu, storage available at various sites  Welcome to the world of grid computing  Take advantage of distributed resources  Set up a framework for other disciplines also  Fault tolerant job execution.  Also used by Medicine, Chemistry, Space science, …  LHCb interface : DIRAC

What the user sees …  Submit job to the “grid”  Ganga (ATLAS/LHCb)  Sometimes needs a lot of persuasion  Usually the job comes back successful  On occasion problems seen  Frequently wrong parameters, code, …  Correct and resubmit

What the user does not see …

Requirements of DIRAC  Fault tolerance  Retries  Duplication  Failover  Guard against possible grid problems …  Network, timeouts  Drive failures  Systems hacked  Bugs in code  If it cannot go wrong, it still will  Caching  Watchdogs  Logs  Overloaded machine, service  Thread safety  Fire, Cooling problems

Submitting jobs on the grid  Two ways of submitting jobs  Push jobs out to a site’s batch system  The grid is a simple multiple batch system  Job waits at the site until it runs  Lose control of jobs when they leave us (LHCb)  Many things can change in the time between job submission and running  We only see the batch systems / queues  We do not see the status of the grid in real time  Cause of low success rate – previous experience  Load on site  Site temporary downtime  Change in job priority within the experiment  Pull jobs into the site  Pilot jobs

Pilot jobs  “Wrapper” jobs  Submitted to a site  If site is available, free & there are waiting jobs  Pilot job returns information at current time  Job may have resource requirements too …  Look at local environment and request job from DIRAC  DIRAC returns job with highest priority matching available resource  Internal job prioritisation within DIRAC  Has latest information on experiment priorities  Exit after a short delay if no matching job found  Have fine grained (level of worker node) view of the grid  Very high job success rate  Pioneered by LHCb  Very simple requirements for sites

 Does all on previous slide  Refinements still needed (as always)  Job prioritisation still static  Dynamic job prioritisation on the way  Basic logs all in place  Not everything easy to view for user / shifter  Being improved  More improvements in resilience upcoming  DIRAC portal :  All needed information for LHCb users  Locating data, Job monitoring, …  Restricted information for outsiders  Grid privacy issues  Ganga + DIRAC the only official LHCb grid interface  Will support any reasonable use case

Successes …  A single machine is the DIRAC server  No particular load issues seen

Analysis also going on Comparison of different monte carlo

The occasional problem  Black hole worker nodes  Bad environment that cannot match jobs  Sink for our pilot jobs  Once sink for production jobs also  Migration from sl3 to sl4  Introduce short sleep time before pilot exits  DOS attack on CERN servers  Software being downloaded from CERN  Was done if software was not available locally  Now users do not install software

We donot understand …  Very very preliminary  Still working on understanding this  “Same” class of cpu-s at different sites CPU time scaled median for the cpu class

Now over to ATLAS …