25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory.

Slides:



Advertisements
Similar presentations
B A B AR and the GRID Roger Barlow for Fergus Wilson GridPP 13 5 th July 2005, Durham.
Advertisements

31/03/00 CMS(UK)Glenn Patrick What is the CMS(UK) Data Model? Assume that CMS software is available at every UK institute connected by some infrastructure.
12th September 2002Tim Adye1 RAL Tier A Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting Imperial College, London 12 th September 2002.
LHCb Computing Activities in UK Current activities UK GRID activities RICH s/w activities.
Resources for the ATLAS Offline Computing Basis for the Estimates ATLAS Distributed Computing Model Cost Estimates Present Status Sharing of Resources.
Randall Sobie The ATLAS Experiment Randall Sobie Institute for Particle Physics University of Victoria Large Hadron Collider (LHC) at CERN Laboratory ATLAS.
 Changes to sources of funding for computing in the UK.  Past and present computing resources.  Future plans for computing developments. UK Status &
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
Jean-Yves Nief, CC-IN2P3 Wilko Kroeger, SCCS/SLAC Adil Hasan, CCLRC/RAL HEPiX, SLAC October 11th – 13th, 2005 BaBar data distribution using the Storage.
1 Andrew Hanushevsky - HEPiX, October 6-8, 1999 Mass Storage For BaBar at SLAC Andrew Hanushevsky Stanford.
Exploiting the Grid to Simulate and Design the LHCb Experiment K Harrison 1, N Brook 2, G Patrick 3, E van Herwijnen 4, on behalf of the LHCb Grid Group.
CERN/IT/DB Multi-PB Distributed Databases Jamie Shiers IT Division, DB Group, CERN, Geneva, Switzerland February 2001.
Ian Fisk and Maria Girone Improvements in the CMS Computing System from Run2 CHEP 2015 Ian Fisk and Maria Girone For CMS Collaboration.
CMS Report – GridPP Collaboration Meeting VI Peter Hobson, Brunel University30/1/2003 CMS Status and Plans Progress towards GridPP milestones Workload.
April 2001HEPix/HEPNT1 RAL Site Report John Gordon CLRC, UK.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
Computing for ILC experiment Computing Research Center, KEK Hiroyuki Matsunaga.
2nd April 2001Tim Adye1 Bulk Data Transfer Tools Tim Adye BaBar / Rutherford Appleton Laboratory UK HEP System Managers’ Meeting 2 nd April 2001.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
8th November 2002Tim Adye1 BaBar Grid Tim Adye Particle Physics Department Rutherford Appleton Laboratory PP Grid Team Coseners House 8 th November 2002.
LHC Computing Review - Resources ATLAS Resource Issues John Huth Harvard University.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
Finnish DataGrid meeting, CSC, Otaniemi, V. Karimäki (HIP) DataGrid meeting, CSC V. Karimäki (HIP) V. Karimäki (HIP) Otaniemi, 28 August, 2000.
LHC Computing Plans Scale of the challenge Computing model Resource estimates Financial implications Plans in Canada.
ATLAS and GridPP GridPP Collaboration Meeting, Edinburgh, 5 th November 2001 RWL Jones, Lancaster University.
7April 2000F Harris LHCb Software Workshop 1 LHCb planning on EU GRID activities (for discussion) F Harris.
Tier-2  Data Analysis  MC simulation  Import data from Tier-1 and export MC data CMS GRID COMPUTING AT THE SPANISH TIER-1 AND TIER-2 SITES P. Garcia-Abia.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
GridPP Building a UK Computing Grid for Particle Physics Professor Steve Lloyd, Queen Mary, University of London Chair of the GridPP Collaboration Board.
Ian Bird LHC Computing Grid Project Leader LHC Grid Fest 3 rd October 2008 A worldwide collaboration.
…building the next IT revolution From Web to Grid…
The LHCb CERN R. Graciani (U. de Barcelona, Spain) for the LHCb Collaboration International ICFA Workshop on Digital Divide Mexico City, October.
Les Les Robertson LCG Project Leader High Energy Physics using a worldwide computing grid Torino December 2005.
Grid User Interface for ATLAS & LHCb A more recent UK mini production used input data stored on RAL’s tape server, the requirements in JDL and the IC Resource.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
26 September 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 26 th September 2000.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
11th November 2002Tim Adye1 Distributed Analysis in the BaBar Experiment Tim Adye Particle Physics Department Rutherford Appleton Laboratory University.
Outline  Higgs Particle Searches for Origin of Mass  Grid Computing  A brief Linear Collider Detector R&D  The  The grand conclusion: YOU are the.
Computing for LHC Physics 7th March 2014 International Women's Day - CERN- GOOGLE Networking Event Maria Alandes Pradillo CERN IT Department.
BaBar and the GRID Tim Adye CLRC PP GRID Team Meeting 3rd May 2000.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
15 December 2000Tim Adye1 Data Distribution Tim Adye Rutherford Appleton Laboratory BaBar Collaboration Meeting 15 th December 2000.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
Distributed Physics Analysis Past, Present, and Future Kaushik De University of Texas at Arlington (ATLAS & D0 Collaborations) ICHEP’06, Moscow July 29,
The Worldwide LHC Computing Grid Frédéric Hemmer IT Department Head Visit of INTEL ISEF CERN Special Award Winners 2012 Thursday, 21 st June 2012.
LHCb Current Understanding of Italian Tier-n Centres Domenico Galli, Umberto Marconi Roma, January 23, 2001.
A Data Handling System for Modern and Future Fermilab Experiments Robert Illingworth Fermilab Scientific Computing Division.
Belle II Computing Fabrizio Bianchi INFN and University of Torino Meeting Belle2 Italia 17/12/2014.
1 June 11/Ian Fisk CMS Model and the Network Ian Fisk.
Jianming Qian, UM/DØ Software & Computing Where we are now Where we want to go Overview Director’s Review, June 5, 2002.
1 Particle Physics Data Grid (PPDG) project Les Cottrell – SLAC Presented at the NGI workshop, Berkeley, 7/21/99.
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
Computing infrastructures for the LHC: current status and challenges of the High Luminosity LHC future Worldwide LHC Computing Grid (WLCG): Distributed.
Hall D Computing Facilities Ian Bird 16 March 2001.
LHC collisions rate: Hz New PHYSICS rate: Hz Event selection: 1 in 10,000,000,000,000 Signal/Noise: Raw Data volumes produced.
ATLAS – statements of interest (1) A degree of hierarchy between the different computing facilities, with distinct roles at each level –Event filter Online.
The LHC Computing Challenge
UK GridPP Tier-1/A Centre at CLRC
LHCb(UK) Computing Status Glenn Patrick
SLAC B-Factory BaBar Experiment WAN Requirements
Using an Object Oriented Database to Store BaBar's Terabytes
Kanga Tim Adye Rutherford Appleton Laboratory Computing Plenary
Measuring the B-meson’s brief but eventful life
LHCb thinking on Regional Centres and Related activities (GRIDs)
The LHC Computing Grid Visit of Professor Andreas Demetriou
Presentation transcript:

25 February 2000Tim Adye1 Using an Object Oriented Database to Store BaBar's Terabytes Tim Adye Particle Physics Department Rutherford Appleton Laboratory CLRC

25 February 2000Tim Adye2 Outline The BaBar experiment at SLAC Data storage requirements Use of an Object Oriented Database Data organisation SLAC UK Future experiments

25 February 2000Tim Adye3 The BaBar experiment is based in California at the Stanford Linear Accelerator Center, and was designed and built by more than 500 physicists from 10 countries, including from 9 UK Universities and RAL. It is looking for the subtle differences between the decay of the B 0 meson and its antiparticle (B 0 ). If this “CP Violation” is large enough, it could explain the cosmological matter- antimatter asymmetry. We are are looking for a subtle effect in a rare (and difficult to identify) decay, so need to record the results of a large numbers of events.

25 February 2000Tim Adye4

25 February 2000Tim Adye5 A BaBar Event

25 February 2000Tim Adye6 How much data? Since BaBar started operation last May, we have recorded and analysed 210 million events. 48 million written to database remainder rejected after analysis At least 4 more years’ running and continually improving luminosity. Eventually record data at ~100 Hz; ~10 9 events/year. Each event uses kb. Also need to generate 1-10 times that number of simulated events. Database currently holds 33 Tb Expect to reach ~300 Tb/year Ie. 1-2 Pb in the lifetime of the experiment.

25 February 2000Tim Adye7 Why an OODBMS? BaBar has adopted C++ and OO techniques The first large HEP experiment to do so wholesale. An OO Database has a more natural interface for C++ (and Java). Require distributed database Event processing and analysis takes place on many processors 200 node farm at SLAC A single data server cannot cope Data structures will change over time Cannot afford to reprocess everything Schema evolution Objectivity chosen Front runner also at CERN

25 February 2000Tim Adye8 How do we organise the data? Traditional HEP analyses read each event and select relevant events, for which additional processing is done. Can be done with sequential file Many different analyses performed by BaBar physicists. In BaBar there is too much data. Won’t work if all the people to read all the data all of the time. Even if all of it could be on disk. Organise data into different levels of detail Stored in separate files tag, “microDST”, “miniDST”, full reconstruction, raw data Objectivity keeps track of cross-references Only read more detailed information for selected events. But different selections for different analyses

25 February 2000Tim Adye9 What happens at SLAC? Cannot store everything on disk Maybe 10 Tb, but not 1 Pb. Already buying ~1 Tb disk per month. Analysis requires frequent access to summary information. Keep tag and “microDST” on disk More information for most interesting events on disk Rest in mass store (HPSS at SLAC) In future even this may not be enough Only tag and “dataset of the month” on disk?

25 February 2000Tim Adye10 Performance Main challenge is getting this to scale to hundreds of processes/ors reading and writing at the same time. The vendor seems to believe we can do it. “ The Terabyte Wars are over While other vendors quarrel about who can store 1 Terabyte in a database, the BaBar physics experiment at the Stanford Linear Accelerator Center (SLAC) has demonstrated putting 1 Terabyte of data PER DAY into an Objectivity Database.” Top news item on Objectivity web site But it took a lot of work...

25 February 2000Tim Adye11 Performance Scaling A lot of effort has gone into improving speed of recording events Ongoing work on obtain similar improvements in data access.

25 February 2000Tim Adye12 Regional Centres Cannot do everything at SLAC Even with all the measures to improve analysis efficiency at SLAC, it cannot support entire collaboration. Network connection from UK is slow, sometimes very slow, occasionally unreliable. Therefore need to allow analysis outside SLAC. “Regional Centres” in UK, France, and Italy. RAL is the UK Regional Centre. Major challenge to transfer data from SLAC, and to reproduce databases and analysis environment at RAL.

25 February 2000Tim Adye13 UK Setup At RAL, have Sun analysis and data server machines with 5 Tb disk UK Universities have Tb locally All part of £800k JREI award Import microDST using DLT-IV ~50 Gb/tape with compression So far exported 2 Tb Interfaced to Atlas Datastore (see Tim Folkes’ talk). Less-used parts of the federation can be archived Can be brought back to disk on demand needs further automation Also acts as a local backup.

25 February 2000Tim Adye14 Other Experiments BaBar’s requirements are modest with respect to what is to come Tevatron Run II ~1 Pb/year CDF JIF (Joint Infrastructure Fund) award Regional Centre at RAL LHC Experiments many Pb/year Prototype Tier 1 centre at RAL 3100 PC99, 125 Tb disk, 0.3 Pb tape, 50 Mbps network to CERN Tier 2 centres at Edinburgh and Liverpool Require ~5 times more at startup

25 February 2000Tim Adye15 Future Software Choice of HSM. HPSS is expensive. Maybe we don’t need all the bells and whistles. But already in use at SLAC/CERN/... EuroStore (EU/CERN/DESY/...) ENSTORE (Fermilab) CASTOR (CERN) LHC Experiments still have time to decide... Is Objectivity well-suited to our use? Develop our own OODBMS? Espresso (CERN) BaBar is being watched closely...