Presentation is loading. Please wait.

Presentation is loading. Please wait.

Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the.

Similar presentations


Presentation on theme: "Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the."— Presentation transcript:

1 Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the LHCb collaboration

2 Outline  The questions…  The LHC – the experiments taking up the challenge  The LHCb experiment at the LHC  LHCb computing model  Data flow, processing requirements  Distributive computing in LHCb  Architecture and functionality  Performance

3 The questions…  The Standard Model of particle physics explains much of the interactions between the fundamental particles that form the Universe  all experiments so far confirming its predictions  BUT many questions still remain …  How does gravity fit into the model?  Where does all the mass come from?  Why do we have a Universe made up of matter?  Does dark matter exist and how much?  Search for phenomena beyond our current understanding  Go back to the 1 st billionth of a second after the BIG BANG…

4 The Large Hadron Collider  100m below surface on Swiss/French border  14 TeV proton-proton collider, 7x higher than previous machines  1,232 superconducting magnets chilled to -271.3ºc  4 experiments/detectors France Swiss CMS Alice LHCb Atlas 27Km After ~25 years since its first proposal… 1 st circulating beam tomorrow! 1 st collisions in October 2008.

5 LHCb pp VErtex LOcator – b decay vertex Operates only ~5mm from the beam Ring Imaging CHerenkov detector – particle ID Human eye – 100 photos/s RICH – 40million photos/s  LHC beauty experiment  Special purpose detector to search for:  New physics in very rare b quark decays  Investigates particle-antiparticle asymmetry  ~ 1Trillion bb pairs per year!

6 Data flow  Five main LHCb applications (C++ : Gauss, Boole, Brunel, DaVinci Python: Bender) Gauss Event Generation Detector Simulation Boole Digitization Brunel Reconstruction DaVinci Analysis Bender Sim DST Statistics RAW Data flow from detector Production Job  Detector calibrations Analysis Job Sim – Simulation data format DST – Data Storage Tape

7 CPU times Gauss Event Generation Detector Simulation Analysis 2,000 interesting events selected per second = 50MB/s data transferred and stored 40 million collisions (events) per second DST Offline  full reconstruction, 150MB processed per second of running Full simulation reconstruction, 100MB / event 500KB / event 962 physicist, 56 institutes in 4 continents Full simulation DST  80s / event (2.8GHz Xeon processor) ~100 years for 1 CPU to simulate 1s of real data! 10 7 s data taking / year + simulation  ~ O (PB) data per year

8 LHCb computing structure CERN RAL, UK PIC, Spain IN2P3, France GridKA, Germany NIKHEF, Netherlands CNAF, Italy Detector RAW data transfer 10MB/s Simulation data transfer 1MB/s  Tier 0 CERN  Raw data, ~3K cpu  Tier 1 Large centres  Reconstruction and analysis, ~15K cpu  Tier 2 Universities (~34)  Simulations, ~19K cpu  Tier 3 / 4 Laptops, desktops etc…  Simulations  Needs distributed computing

9 LHCb Grid Middleware - DIRAC  LHCb’s grid middleware: Distributed Infrastructure with Remote Agent Control  Python  Multi-platform (Linux, Windows)  Built with common grid tools  GSI (Grid Security Infrastructure) authentication  Pulls together all resources, shared with other experiments  Uses experimental wide CPU fair share  Optimises CPU usage with  Long, steady simulation jobs by production managers  Chaotic analysis usage by individual users

10 DIRAC architecture  Service orientated architecture  4 parts  User interface  Services  Agents  Resources  Uses a pull strategy for assigning CPU’s  Free, stable CPU’s request for jobs from main server  Useful in masking instability of resources from users

11 Linux based Multi-platform Combination of DIRAC services and non-DIRAC services Web monitoring

12 Security and data access  DISET, DIRAC SEcuriTy module  Uses openssl and modified pyopenssl  Allows for proxy support for secure access  DISET portal used to facilitate secure access on various platforms when authentication process is OS dependent  Platform binaries shipped with DIRAC, version is determined during installation  Various data access protocols supported  SRM, GridFTP,.NetGridFTP on Windows etc …  Data Services operates on main server  Each file is assigned a logical file name that matches to the physical file name(s)

13 Compute element resources  Other grids, e.g. WLCG (Worldwide LHC Computing Grid)  Linux machines  Local batch systems, Condor  Stand alone, desktops, laptops etc …  Windows  3 sites so far ~100 CPU  Windows Server, Windows Compute Cluster  Windows XP  ~90% of the World’s computers are Windows

14 Pilot agents  Used to access other grid resources, e.g. WLCG via gLite  User job triggers pilot agent submission by DIRAC as a ‘grid job’ to reserve CPU time  Pilot on WN checks environment before retrieving the user job from DIRAC WMS  Advantages  Easy control of CPU quota for shared resources  Several pilot agents can be deployed for the same job if failure on WN occurs  If full reserved CPU time is not used another job can also be retrieved from the DIRAC WMS

15 Agents on Windows  Windows resources – CPU scavenging  Non-LHC dedicated CPU’s  Spare CPU’s at Universities, private home computers etc…  Agent launch would be triggered by e.g. screen saver  CPU resource contribution determined by owner during DIRAC installation  Windows Compute Cluster  Shared single DIRAC installation  Job Wrapper submits retrieved jobs via Windows CC submission calls  Local job scheduling determined by the Windows CC scheduling service

16 Cross-platform submissions  Submissions made with valid grid proxy  Three Ways  JDL (Job Description Language)  DIRAC API  Ganga job management system  Built on DIRAC API commands  Full porting to Windows under process SoftwarePackages = { “DaVinci.v19r12" }; InputSandbox = { “DaVinci.opts” }; InputData = { "LFN:/lhcb/production/DC06/v2/00980000/DST/Presel _00980000_00001212.dst" }; JobName = “DaVinci_1"; Owner = "yingying"; StdOutput = "std.out"; StdError = "std.err"; OutputSandbox = { "std.out", "std.err", “DaVinci_v19r12.log” “DVhbook.root” }; JobType = "user"; JDL import DIRAC from DIRAC.Client.Dirac import * dirac = Dirac() job = Job() job.setApplication(‘DaVinci', 'v19r12') job.setInputSandbox(['DaVinci.opts’]) job.setInputData(['LFN:/lhcb/production/DC 06/v2/00980000/DST/Presel_00980000_0000 1212.dst']) job.setOutputSandbox([‘DaVinci_v19r12.lo g’, ‘DVhbook.root’]) dirac.submit(job) API  User pre-compiled binaries can also be shipped  Jobs are then bound to be processed on the same platform  Successfully used in full selection and background analysis studies (User – Windows, resources – Windows and Linux)

17 Performance  Successful processing of data challenges since 2004  Latest data challenge  record of >10,000 simultaneous jobs (analysis and production)  700M events simulated in 475 days, ~1700 years of CPU time Windows SitesLinux Sites Total Running Jobs: 9715

18 Conclusions  LHC will expect O (PBytes) of data per year per experiment  Data to be analysed by 1,000s physicists on 4 continents  LHCb distributed computing structure is in place, pulling together a total of ~40K CPU’s from across the World  The DIRAC system has been fine tuned on the experiences from the past 4 years of intensive testing  We now eagerly await for the LHC switch on and the true test! 1 st beams tomorrow morning!!!


Download ppt "Petabyte-scale computing challenges of the LHCb experiment UK e-Science All Hands Meeting 2008, Edinburgh, 9 th September 2008 Y. Y. Li on behalf of the."

Similar presentations


Ads by Google