Presentation is loading. Please wait.

Presentation is loading. Please wait.

Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March 24 - 28, La Jolla, California.

Similar presentations


Presentation on theme: "Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March 24 - 28, La Jolla, California."— Presentation transcript:

1 Objectivity Data Migration Marcin Nowak, CERN Database Group, http://cern.ch/db CHEP 2003 March 24 - 28, La Jolla, California

2 CHEP 2003Marcin Nowak, CERN DB group2 Overview n Objectivity/DB at CERN: history and the current situation n Migration plans n Designing a new storage system based on RDBMS n Migrating the data: System setup Performance Results n The new storage system in production n Conclusions

3 CHEP 2003Marcin Nowak, CERN DB group3 Objectivity/DB at CERN n Objectivity/DB has been introduced at CERN in 1995 through RD45 Fully Object-Oriented database management system  Strong C++ and Java bindings,  Petabyte scalability It has been successfully used to store physics data by many experiments, including LHC and pre-LHC ones The main focus was always on application for LHC n Following the change of persistency baseline of LHC experiments, Objectivity/DB maintenance contract for CERN has been discontinued at the end of 2001 n Existing CERN Objectivity licenses are perpetual and allow using the software on the supported platforms indefinitely Objectivity v6.1.3: RedHat 6.x with g++2.95.2 and Solaris 7-8 No maintenance means:  No new releases or bug fixes  No support for new versions of operating systems or compilers Increasingly difficult to support Objectivity applications n End of Objectivity support after Q2 2003 agreed with experiments through FOCUS committee

4 CHEP 2003Marcin Nowak, CERN DB group4 Objectivity Databases in CERN Experiments n LHC experiments Event data  So far it is only test data – no need to preserve it for too long  Longer time scale permits to wait for a new, complete storage solution provided by LHC Computing Grid (LCG) projects –See POOL project presentations in ‘Persistency’ track on Monday Measurement data (detector construction data, etc.)  Migration to relational databases (JDBC, MySQL) n Pre-LHC experiments COMPASS  300TB of event data collected mainly in 2002  Running experiment - will collect more data in 2003 – 2004 Expecting 0.5-1PB in total HARP  30TB event data

5 CHEP 2003Marcin Nowak, CERN DB group5 Migration Project n Migration project scope: Migrate HARP and COMPASS physics data to a completely new persistency technology COMPASS migration has to be finished in advance before 2003 data-taking starts (May 2003)  Impossibility of changing the storage system during data taking creates a risk of additional year of Objectivity support Migrate data to new tape media  Available only at the end of 2002 n Migration tasks: Design and development of a new storage system Data migration Adaptation of the experiments’ software frameworks n Manpower: 2-3 FTEs Budget: none – use CERN/IT shared hardware resources

6 CHEP 2003Marcin Nowak, CERN DB group6 Designing The New System n The databases of both experiments share basic design principles: Raw events as BLOBs in original online format (DATE)  database files are stored in Castor (CERN HSM) Limited event metadata (RUN information, event headers) Conditions data is kept in ConditionsDB (CDB) COMPASS stores reconstructed events (DST) as persistent objects Physics analysis data is stored outside Objectivity n The proposed new data storage system: A hybrid solution based on a relational database and flat files  a la POOL Preserving essential features of the current system  navigational access to events and reconstructed events

7 CHEP 2003Marcin Nowak, CERN DB group7 New Data Storage - Details n Raw events  Original DATE format, flat files kept in Castor n Metadata  Mainly event headers and navigational information for raw and reconstructed events in a relational database  0.1% - 0.2% of the raw data volume  at CERN stored in Oracle (but without Oracle-specific features)  Possibility for another database in the outside institutes n Conditions data  Migrated to the new CDB implementation based on Oracle  No interface change (abstract interface) n DST  Similar to raw events, but using hard-coded object streaming

8 CHEP 2003Marcin Nowak, CERN DB group8 Database Schema RUN # run number o time o status o logbook Possible data relation Necessary data relation One to many relation Foreign key is a part of primary key for that table # Attribute is a part of primary key * Attribute cannot be null o Null value allowed for this attribute r Attribute is a foreign key u Attribute is a part of Unique constraint DST HEADER # event number * DST size * DST filepos * trigger mask o value1 o value2 o value3 RAW FILE # file ID u file name DST FILE # file ID u file name * DST version * DST type o value1 descr o value2 descr o value3 descr EVENT HDR # event number * event size * event filepos * burst number * event in burst * trigger mask * time * error code

9 CHEP 2003Marcin Nowak, CERN DB group9 Migration Data Flow Diagram LOG ORACLE Input disk pools 2x200GB Castor 9940 9940B Output disk pool Processing Node 10 MB/s overall data throughput per node Objectivity database files DATE files

10 CHEP 2003Marcin Nowak, CERN DB group10 Migration System Setup n Hardware setup: All nodes standard ‘CERN disk server PC’: 2x1GHz PIII systems, Gigabit Ethernet 11 processing nodes with 500GB disk 11 disk servers for the Castor (HSM) output pool  4x220GB + 2x450GB + 5x550GB = 4.5TB 3 Main Oracle databases (500GB disk each) Migration manager 1 migration database Castor manager (stager) 8 dedicated input tape drives 10 output tape drives n Data processing model: Reading entire tape as a single HSM operation Migration transaction granularity: 1 file (Objy database file)

11 CHEP 2003Marcin Nowak, CERN DB group11 Project Timeline n Summer 2002 – initial ideas about migration Designs, testing Oracle features  VLDB, R.A.C., partitions, Linux, C++ binding (OCCI) n Fall 2002 – implementation n November – testing and integration n Middle December – migration start n Christmas CERN closure (3 weeks) – running unattended n Middle January 2003 – achieved full planned speed of 100MB/s sustained n 20 th February 2003 – migration completed Ahead of schedule n Middle March 2003 – migrated data and databases system achieves production status

12 CHEP 2003Marcin Nowak, CERN DB group12 Migration Performance

13 CHEP 2003Marcin Nowak, CERN DB group13 COMPASS RAW Data Statistics n We have migrated 12 Objectivity federations: 290 TB 300 000 files n Into 3 Oracle databases… 12 user accounts 335 GB total data tablespace size 6100 million rows  Expecting 10000-15000 million rows after event reconstruction  50 bytes/row n … and into 220 TB raw data files in Castor 20% data reduction Close to 10% of the files were empty and therefore removed only 80 files failed migration (tape or Objy read problems) n We have copied 15% of all CERN data in Castor!

14 CHEP 2003Marcin Nowak, CERN DB group14 The New Storage System n The new COMPASS event database system consists of 3 Oracle9i servers running on commodity PC hardware with RedHat7.2 This configuration is expected to handle raw and reconstructed data from one year of data taking Still investigating Oracle R.A.C. setup with commodity shared disk storage  goal: minimize the number of maintained databases No Oracle-specific features were used  Data export to other sites should be easy n RAW data is stored in Castor Accessed directly from the application (not going through any central server) The number of Castor disk pools can be changed dynamically according to the needs Each event can be accessed immediately  Subject to file caching on the disk server

15 CHEP 2003Marcin Nowak, CERN DB group15 Ref. V. Duic/ M. Lamanna. “The COMPASS event store in 2002”. Parallel Session 8: Data Management and Persistency. Performance of the New System

16 CHEP 2003Marcin Nowak, CERN DB group16 Conclusions n The COMPASS Objectivity data migration has been finished successfully on schedule with minimal resource investment n The new event database implemented on RDBMS provides the required functionality: Navigational access Speed Scalability  Used for DST production with >350 concurrent processes n Very valuable exercise in handling large event databases for IT/DB group Proof of usability of Oracle database on commodity hardware with Linux


Download ppt "Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March 24 - 28, La Jolla, California."

Similar presentations


Ads by Google