Presentation is loading. Please wait.

Presentation is loading. Please wait.

11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University.

Similar presentations


Presentation on theme: "11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University."— Presentation transcript:

1 11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University of Oxford 11 th November 2002

2 Tim Adye2 Talk Plan Physics motivation The BaBar Experiment Distributed analysis and the Grid

3 11th November 2002Tim Adye3 Where did all the Antimatter Go? Nature treats matter and antimatter almost identically but the Universe is made up of just matter How did this asymmetry arise? The “Standard Model” of Particle Physics allows for a small matter-antimatter asymmetry in the laws of physics Seen in some K 0 -meson decays Eg. 0.3% asymmetry This “CP Violation” in the Standard Model is not large enough to explain the cosmological matter-antimatter asymmetry on its own Until recently, CP Violation had only been observed in K-decays To understand more, we need examples from other systems…

4 What BaBar is looking for The Standard Model also predicts that we should be able to see the effect in B-meson decays B-mesons can decay in 100s of different modes In the decays B 0 → J/Ã K 0 and B 0 → J/Ã K 0 we look for differences in the time- dependent decay rate between B 0 and anti-B 0 (B 0 ). Asymmetry s s

5 11th November 2002Tim Adye5 First Results Summary of the summary First results from BaBar (and rival experiment, Belle) confirm the Standard Model of Particle Physics The observed CP Violation is too small to explain the cosmological matter-antimatter asymmetry … but there are many many more decay modes to examine We are making more than 80 measurements with different B-meson, charm, and ¿ -lepton decays.

6 11th November 2002Tim Adye6 Experimental Challenge Individual decays of interest are only 1 in 10 4 to 10 6 B-meson decays We are looking for a subtle effect in rare (and often difficult to identify) decays, so need to record the results of a large number of events.

7 11th November 2002Tim Adye7 The BaBar Collaboration 9 Countries 74 Institutions 566 Physicists

8 PEP−II e + e - Ring at SLAC Low Energy Ring (e +, 3.1 GeV) High Energy Ring (e -, 9.0 GeV) Linear Accelerator PEP-II ring: C=2.2 km BaBar

9 11th November 2002Tim Adye9 The BaBar Detector ~10 8 B 0 B 0 decays recorded 26 th May 1999: first events recorded by BaBar

10 11th November 2002Tim Adye10 To effectively analyse this enormous dataset, we need large computing facilities – more than can be provided at SLAC alone Distributing the analysis to other sites raises many additional research questions 1.Computing facilities 2.Efficient data selection and processing 3.Data distribution 4.Running analysis jobs at many sites Most of this development either has, or will, benefit from Grid technologies

11 11th November 2002Tim Adye11 Distributed computing infrastructure Distributed model originally partly motivated by slow networks Now use fast networks to make full use of hardware (especially CPU and disk) at many sites Currently specialisation at different sites concentrates expertise eg. RAL is primary repository of analysis data in the “ROOT” format Tier A Tier C ~20 Universities, 9 in UK Lyon RAL Padua 1. Facilities

12 11th November 2002Tim Adye12 RAL Tier A Disk and CPU 1. Facilities

13 11th November 2002Tim Adye13 RAL Tier A RAL has now relieved SLAC of most analysis BaBar analysis environment tries to mimic SLAC so external users feel at home Grid job submission should greatly simplify this requirement Impressive takeup from UK and non-UK users 1. Facilities

14 11th November 2002Tim Adye14 BaBar RAL Batch Users (running at least one non-trivial job each week) A total of 153 new BaBar users registered since December 1. Facilities

15 11th November 2002Tim Adye15 BaBar RAL Batch CPU Use 1. Facilities

16 11th November 2002Tim Adye16 Data Processing Full data sample (real and simulated data) in all formats is currently ~700 TB. Fortunately processed analysis data is only ~20 TB. Still too much too store at most smaller sites Many separate analyses looking at different particle decay modes Most analyses only require access to a sub-sample of the data Typically 1-10% of the total Cannot afford for all the people to access all the data all the time Overload the CPU or disk servers Currently specify 104 standard selections (“skims”) with more efficient access 2. Data Processing

17 11th November 2002Tim Adye17 Strategies for Accessing Skims 1.Store an Event tag with each event to allow fast selection based on standard criteria Still have to read past events that aren’t selected Cannot distribute selected sub-samples to Tier C sites 2.Index files provide direct access to selected events in the full dataset File, disk, and network buffering still leaves significant overhead Data distribution possible, but complicated therefore only just starting to use this 3.Copy some selected events into separate files Fastest access and easy distribution, but uses more disk space – a critical trade-off Currently this gives us a factor 4 overhead in disk space We will reduce this when index files are deployed 2. Data Processing

18 11th November 2002Tim Adye18 Physics Data Selection (metadata) Currently have about a million ROOT files in a deep directory tree Need a catalogue to facilitate data distribution and allow analysis datasets to be defined. SQL database Locates ROOT files associated with each dataset File selection based on decay mode, beam energy, etc. Each site has its own database Includes a copy of SLAC database with local information (eg. files on local disk, files to import, local tape backups) 2. Data Processing

19 11th November 2002Tim Adye19 Data Distribution Tier A analysis sites currently take all the data Requires large disks, fast networks, and specialised transfer tools FTP does not make good use of fast wide-area networks Data imports fully automated Tier C sites only take some decay modes We have developed a sophisticated scheme to import data to Tier A and C sites based on SQL database selections Can involve skimming data files to extract events from a single decay mode. This is done automatically as an integral part of the import procedure 3. Data Distribution

20 11th November 2002Tim Adye20 Remote Job Submission Why? The traditional model of distributed computing relies on people logging into each computing centre, building, and submitting jobs from there. Each user has to have an account at each site and write or copy their analysis code to that facility Fine for one site, maybe two. Any more is a nightmare for site managers (user registration and support) and users (set everything up from scratch) 4. Job Submission

21 11th November 2002Tim Adye21 Remote Job Submission A better model would be to allow everyone to submit jobs to different Tier A sites directly from their home university, or even laptop Simplifies local analysis code development and debugging, while providing access to full dataset and large CPU farms This is a classic Grid application This requires significant infrastructure Authentication and authorisation Standardise job submission environment Grid software versions, batch submission interfaces The program and configuration for each job has to be sent to the executing site; and results returned at the end. We are just now starting to use this for real analysis jobs 4. Job Submission

22 11th November 2002Tim Adye22 The Wider Grid We are already using many of the systems being developed for the European and US DataGrids. Globus, EDG job submission, CA, VO, RB, high-throughput FTP, SRB Investigating the use of many more RLS, Spitfire, R-GMA, VOMS, … We are collaborating with other experiments BaBar is a member of EDG WP8 and PPDG (European and US particle physics Grid applications groups) We are providing some of the first Grid technology use-cases

23 11th November 2002Tim Adye23 Summary BaBar is using B decays to measure matter-antimatter asymmetries and perhaps explain why the universe is matter dominated. Without distributing the data and computing, we could not meet the computing requirements of this high-luminosity machine. Our initial ad-hoc architecture is evolving towards a more automated system – borrowing ideas, technologies, and resources from, and providing ideas and experience for, the Grid.


Download ppt "11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University."

Similar presentations


Ads by Google