Presentation is loading. Please wait.

Presentation is loading. Please wait.

On the Verge of One Petabyte – the Story Behind the BaBar Database System Jacek Becla Stanford Linear Accelerator Center For the BaBar Computing Group.

Similar presentations


Presentation on theme: "On the Verge of One Petabyte – the Story Behind the BaBar Database System Jacek Becla Stanford Linear Accelerator Center For the BaBar Computing Group."— Presentation transcript:

1 On the Verge of One Petabyte – the Story Behind the BaBar Database System Jacek Becla Stanford Linear Accelerator Center For the BaBar Computing Group

2 2 of 18CHEP’03 Outline u Talk will cover –Our experience with running large scale DB system  Achievements, issues  New development and what drives it –Main focus on period since last CHEP

3 3 of 18CHEP’03 Providing Persistency for BaBar u Growing complexity and demands u Changing requirements u Hitting unforeseen limits in many places u Non-trivial maintenance –Most problems are persistent-technology independent –System becoming more and more distributed u Very lively environment –Production not as stable as one would imagine

4 4 of 18CHEP’03 Some Numbers u 750+ TB of data u 0.5+ million DB files u Several billion events u 60+ million collections u 1000+ simultaneous analysis jobs accessing DB common

5 5 of 18CHEP’03 Data Availability is Essential u PromptCalibration –Rapid feedback, keeping up with detector u Event Reco (ER) –Data available for analysis within a week u Reprocessing –All data reprocessed before conferences u Analysis –Outages < 4%  driven mostly by power outages, hardware failures

6 6 of 18CHEP’03 What Changed Since Sep'01/last CHEP? u Event Reconstruction –4 output physics streams  20 –20 output streams  5 + 115 pointer collections –Rolling calibrations now separated –Runs now processed in parallel –Raw and rec not persisted anymore –Planning to run skim production separately continued…

7 7 of 18CHEP’03 What Changed Since Sep'01/last CHEP? u Simulation Production –1.5  3 MC events per real event –~8  ~24 production sites u Analysis –Bridge federations now fully functional –Significant system growth  29 data servers, 34 lock/journal servers  66TB disk space, 101 slave federations

8 8 of 18CHEP’03 Some Challenges Setting up ER/REP in Padova –All Linux based Recovery from Linux-client crashes leaving connections on server side  Three data corruptions 1.Understood and fixed – race condition: file descriptors closed/reopened incorrectly 2.Never understood, went away after power outage (Dec'02)  Not sure who is at fault: Objectivity? Linux kernel? 3.Problems with B-Tree index updates in Temporal Database Imposed by our software –Lock collisions –Large number of skim collections  Overflowing containers

9 9 of 18CHEP’03 Some New Features u Bridge federations –all 3 phases deployed u Data compression u New Conditions DB (CDB) u Automatic load balancing

10 10 of 18CHEP’03 Conditions DB u Main features –New conceptual model for metadata  2-d space of validity and insertion time, revisions, persistent configurations, types of conditions, hierarchical namespace for conditions –Flexible user data clustering –Support for distributed updates and use –State ID –Scalability problems solved –Significant (100-1000x) speedup for critical use cases u Status –In production since Fall’02 –Data converted to new format –Working on distributed management tools

11 11 of 18CHEP’03 AMS Load Balancing u Dynamically stages in/replicates files –Based on configurable parameters and host load u Increases fault tolerance –Data servers can be taken offline transparently u Scalable –Hierarchical u Currently being tested Dynamic Selection DistinguishedAMS

12 12 of 18CHEP’03 Size u Raw/rec not persisted –Event: ~200 kB  ~20kB u Continues to grow fast –Higher luminosity –115 skims –Reprocessing all data every year –More MC events (1.5:1  3:1) u Reducing size –Event store redesign (see talk by Yemi tomorrow) –Data compression (achieving ~2:1 compression)

13 13 of 18CHEP’03 Media Attention u World’s largest database –500 TB – see SLAC press release (Apr/02) u Many ideas/problems/solutions common to any large scale database system u Newspaper and local TV coverage –Non-HEP attention

14 14 of 18CHEP’03 Size matters in data world Mountains Of Data: 500 Terabytes And Counting A firm grip, or gagging on gigabytes? Stanford claims world's largest database 500,000 gigabytes and growing: SLAC houses world's largest database University database breaks world record Stanford Linear Accelerator Database Reaches 500,000 Gigabytes Stanford researchers may have world’s largest database

15 15 of 18CHEP’03 New Computing Model u Discussed Fall’02 u Main decisions –Two stage approach  Develop "new micro" in ROOT-based –alternative to nTuples  Develop full event store in ROOT-based –Deprecate ROOT-based conditions  Use existing Objy-based conditions u Main reasons to change –To follow general HEP trend –To allow interactive analysis in ROOT

16 16 of 18CHEP’03 Summary u DB system keeps up with excellent B-Factory performance –No major problems/showstoppers –Tackling with growing size, complexity and demands u Event store technology based on Objectivity –A good, working model, proven in production –Not well proven in analysis  Most users extract data to nTuples –Likely to be deprecated soon u May'99 - Mar'03 –Undoubtedly a successful chapter for the BaBar DB

17 17 of 18CHEP’03 Acknowledgements u Development Team –Andy Hanushevsky –Andy Salnikov (online databases) –Daniel Wang (started Sep’02) –David Quarrie (gone Oct’01) –Igor Gaponenko –Simon Patton (gone March ’02) –Yemi Adesanya u Operations Team –Adil Hasan –Artem Trunov –Wilko Kroeger –Tofigh Azemoon

18 18 of 18CHEP’03 Some Related BaBar Talks u Operation Aspects of Dealing with the Large BaBar Data Set –Category 8, Tuesday 3:30pm u The Redesigned BaBar Event Store – Believe the Hype –Category 8, Tuesday 4:50pm u BdbServer++: A User Instigated Data Location and Retrieval Tool –Category 2 u Distributing BaBar Data Using SRB –Category 2 u Distributed Offline Data Reconstruction in BaBar –Category 3, Tuesday, 6:10pm


Download ppt "On the Verge of One Petabyte – the Story Behind the BaBar Database System Jacek Becla Stanford Linear Accelerator Center For the BaBar Computing Group."

Similar presentations


Ads by Google