11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University.

Slides:



Advertisements
Similar presentations
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
13th November 2002Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting University of Bristol 13 th November.
S. Gadomski, "ATLAS computing in Geneva", journee de reflexion, 14 Sept ATLAS computing in Geneva Szymon Gadomski description of the hardware the.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
1 Use of the European Data Grid software in the framework of the BaBar distributed computing model T. Adye (1), R. Barlow (2), B. Bense (3), D. Boutigny.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
The B A B AR G RID demonstrator Tim Adye, Roger Barlow, Alessandra Forti, Andrew McNab, David Smith What is BaBar? The BaBar detector is a High Energy.
11 Dec 2000F Harris Datagrid Testbed meeting at Milan 1 LHCb ‘use-case’ - distributed MC production
Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.
Edinburgh University Experimental Particle Physics Alasdair Earl PPARC eScience Summer School September 2002.
25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.
BaBar WEB job submission with Globus authentication and AFS access T. Adye, R. Barlow, A. Forti, A. McNab, S. Salih, D. H. Smith on behalf of the BaBar.
CERN - IT Department CH-1211 Genève 23 Switzerland t Monitoring the ATLAS Distributed Data Management System Ricardo Rocha (CERN) on behalf.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
December 17th 2008RAL PPD Computing Christmas Lectures 11 ATLAS Distributed Computing Stephen Burke RAL.
08/06/00 LHCb(UK) Meeting Glenn Patrick LHCb(UK) Computing/Grid: RAL Perspective Glenn Patrick Central UK Computing (what.
5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)
Alexandre A. P. Suaide VI DOSAR workshop, São Paulo, 2005 STAR grid activities and São Paulo experience.
3 Sept 2001F HARRIS CHEP, Beijing 1 Moving the LHCb Monte Carlo production system to the GRID D.Galli,U.Marconi,V.Vagnoni INFN Bologna N Brook Bristol.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
BaBar Grid Computing Eleonora Luppi INFN and University of Ferrara - Italy.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
3rd Nov 2000HEPiX/HEPNT CDF-UK MINI-GRID Ian McArthur Oxford University, Physics Department
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
Nick Brook Current status Future Collaboration Plans Future UK plans.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
Data Grid projects in HENP R. Pordes, Fermilab Many HENP projects are working on the infrastructure for global distributed simulated data production, data.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
SouthGrid SouthGrid SouthGrid is a distributed Tier 2 centre, one of four setup in the UK as part of the GridPP project. SouthGrid.
RAL Site Report John Gordon IT Department, CLRC/RAL HEPiX Meeting, JLAB, October 2000.
WP8 Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid WP8 Meeting, 16th November 2000 Glenn Patrick (RAL)
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
19th September 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Royal Holloway 19 th September 2003.
25th October 2006Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar UK Physics Meeting Queen Mary, University of London 25 th October 2006.
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
GridPP Collaboration Meeting 5 th November 2001 Dan Tovey, University of Sheffield Non-LHC and Non-US-Collider Experiments’ Requirements Dan Tovey, University.
26 September 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 26 th September 2000.
UK Grid Meeting Glenn Patrick1 LHCb Grid Activities in UK Grid Prototype and Globus Technical Meeting QMW, 22nd November 2000 Glenn Patrick (RAL)
PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.
11th April 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Liverpool 11 th April 2003.
BaBar and the GRID Tim Adye CLRC PP GRID Team Meeting 3rd May 2000.
15 December 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 15 th December 2000.
CLRC Grid Team Glenn Patrick LHCb GRID Plans Glenn Patrick LHCb has formed a GRID technical working group to co-ordinate practical Grid.
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
Feedback from CMS Andrew Lahiff STFC Rutherford Appleton Laboratory Contributions from Christoph Wissing, Bockjoo Kim, Alessandro Degano CernVM Users Workshop.
BIG DATA/ Hadoop Interview Questions.
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
BaBar & Grid Eleonora Luppi for the BaBarGrid Group TB GRID Bologna 15 febbraio 2005.
Hall D Computing Facilities Ian Bird 16 March 2001.
BaBar-Grid Status and Prospects
Eleonora Luppi INFN and University of Ferrara - Italy
Overview of the Belle II computing
Moving the LHCb Monte Carlo production system to the GRID
UK GridPP Tier-1/A Centre at CLRC
Readiness of ATLAS Computing - A personal view
Artem Trunov and EKP team EPK – Uni Karlsruhe
Measuring the B-meson’s brief but eventful life
Using an Object Oriented Database to Store BaBar's Terabytes
Gridifying the LHCb Monte Carlo production system
MonteCarlo production for the BaBar experiment on the Italian grid
Kanga Tim Adye Rutherford Appleton Laboratory Computing Plenary
The LHCb Computing Data Challenge DC06
Presentation transcript:

11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University of Oxford 11 th November 2002

Tim Adye2 Talk Plan Physics motivation The BaBar Experiment Distributed analysis and the Grid

11th November 2002Tim Adye3 Where did all the Antimatter Go? Nature treats matter and antimatter almost identically but the Universe is made up of just matter How did this asymmetry arise? The “Standard Model” of Particle Physics allows for a small matter-antimatter asymmetry in the laws of physics Seen in some K 0 -meson decays Eg. 0.3% asymmetry This “CP Violation” in the Standard Model is not large enough to explain the cosmological matter-antimatter asymmetry on its own Until recently, CP Violation had only been observed in K-decays To understand more, we need examples from other systems…

What BaBar is looking for The Standard Model also predicts that we should be able to see the effect in B-meson decays B-mesons can decay in 100s of different modes In the decays B 0 → J/Ã K 0 and B 0 → J/Ã K 0 we look for differences in the time- dependent decay rate between B 0 and anti-B 0 (B 0 ). Asymmetry s s

11th November 2002Tim Adye5 First Results Summary of the summary First results from BaBar (and rival experiment, Belle) confirm the Standard Model of Particle Physics The observed CP Violation is too small to explain the cosmological matter-antimatter asymmetry … but there are many many more decay modes to examine We are making more than 80 measurements with different B-meson, charm, and ¿ -lepton decays.

11th November 2002Tim Adye6 Experimental Challenge Individual decays of interest are only 1 in 10 4 to 10 6 B-meson decays We are looking for a subtle effect in rare (and often difficult to identify) decays, so need to record the results of a large number of events.

11th November 2002Tim Adye7 The BaBar Collaboration 9 Countries 74 Institutions 566 Physicists

PEP−II e + e - Ring at SLAC Low Energy Ring (e +, 3.1 GeV) High Energy Ring (e -, 9.0 GeV) Linear Accelerator PEP-II ring: C=2.2 km BaBar

11th November 2002Tim Adye9 The BaBar Detector ~10 8 B 0 B 0 decays recorded 26 th May 1999: first events recorded by BaBar

11th November 2002Tim Adye10 To effectively analyse this enormous dataset, we need large computing facilities – more than can be provided at SLAC alone Distributing the analysis to other sites raises many additional research questions 1.Computing facilities 2.Efficient data selection and processing 3.Data distribution 4.Running analysis jobs at many sites Most of this development either has, or will, benefit from Grid technologies

11th November 2002Tim Adye11 Distributed computing infrastructure Distributed model originally partly motivated by slow networks Now use fast networks to make full use of hardware (especially CPU and disk) at many sites Currently specialisation at different sites concentrates expertise eg. RAL is primary repository of analysis data in the “ROOT” format Tier A Tier C ~20 Universities, 9 in UK Lyon RAL Padua 1. Facilities

11th November 2002Tim Adye12 RAL Tier A Disk and CPU 1. Facilities

11th November 2002Tim Adye13 RAL Tier A RAL has now relieved SLAC of most analysis BaBar analysis environment tries to mimic SLAC so external users feel at home Grid job submission should greatly simplify this requirement Impressive takeup from UK and non-UK users 1. Facilities

11th November 2002Tim Adye14 BaBar RAL Batch Users (running at least one non-trivial job each week) A total of 153 new BaBar users registered since December 1. Facilities

11th November 2002Tim Adye15 BaBar RAL Batch CPU Use 1. Facilities

11th November 2002Tim Adye16 Data Processing Full data sample (real and simulated data) in all formats is currently ~700 TB. Fortunately processed analysis data is only ~20 TB. Still too much too store at most smaller sites Many separate analyses looking at different particle decay modes Most analyses only require access to a sub-sample of the data Typically 1-10% of the total Cannot afford for all the people to access all the data all the time Overload the CPU or disk servers Currently specify 104 standard selections (“skims”) with more efficient access 2. Data Processing

11th November 2002Tim Adye17 Strategies for Accessing Skims 1.Store an Event tag with each event to allow fast selection based on standard criteria Still have to read past events that aren’t selected Cannot distribute selected sub-samples to Tier C sites 2.Index files provide direct access to selected events in the full dataset File, disk, and network buffering still leaves significant overhead Data distribution possible, but complicated therefore only just starting to use this 3.Copy some selected events into separate files Fastest access and easy distribution, but uses more disk space – a critical trade-off Currently this gives us a factor 4 overhead in disk space We will reduce this when index files are deployed 2. Data Processing

11th November 2002Tim Adye18 Physics Data Selection (metadata) Currently have about a million ROOT files in a deep directory tree Need a catalogue to facilitate data distribution and allow analysis datasets to be defined. SQL database Locates ROOT files associated with each dataset File selection based on decay mode, beam energy, etc. Each site has its own database Includes a copy of SLAC database with local information (eg. files on local disk, files to import, local tape backups) 2. Data Processing

11th November 2002Tim Adye19 Data Distribution Tier A analysis sites currently take all the data Requires large disks, fast networks, and specialised transfer tools FTP does not make good use of fast wide-area networks Data imports fully automated Tier C sites only take some decay modes We have developed a sophisticated scheme to import data to Tier A and C sites based on SQL database selections Can involve skimming data files to extract events from a single decay mode. This is done automatically as an integral part of the import procedure 3. Data Distribution

11th November 2002Tim Adye20 Remote Job Submission Why? The traditional model of distributed computing relies on people logging into each computing centre, building, and submitting jobs from there. Each user has to have an account at each site and write or copy their analysis code to that facility Fine for one site, maybe two. Any more is a nightmare for site managers (user registration and support) and users (set everything up from scratch) 4. Job Submission

11th November 2002Tim Adye21 Remote Job Submission A better model would be to allow everyone to submit jobs to different Tier A sites directly from their home university, or even laptop Simplifies local analysis code development and debugging, while providing access to full dataset and large CPU farms This is a classic Grid application This requires significant infrastructure Authentication and authorisation Standardise job submission environment Grid software versions, batch submission interfaces The program and configuration for each job has to be sent to the executing site; and results returned at the end. We are just now starting to use this for real analysis jobs 4. Job Submission

11th November 2002Tim Adye22 The Wider Grid We are already using many of the systems being developed for the European and US DataGrids. Globus, EDG job submission, CA, VO, RB, high-throughput FTP, SRB Investigating the use of many more RLS, Spitfire, R-GMA, VOMS, … We are collaborating with other experiments BaBar is a member of EDG WP8 and PPDG (European and US particle physics Grid applications groups) We are providing some of the first Grid technology use-cases

11th November 2002Tim Adye23 Summary BaBar is using B decays to measure matter-antimatter asymmetries and perhaps explain why the universe is matter dominated. Without distributing the data and computing, we could not meet the computing requirements of this high-luminosity machine. Our initial ad-hoc architecture is evolving towards a more automated system – borrowing ideas, technologies, and resources from, and providing ideas and experience for, the Grid.