Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March 24 - 28, La Jolla, California.

Slides:



Advertisements
Similar presentations
Object Persistency & Data Handling Session C - Summary Object Persistency & Data Handling Session C - Summary Dirk Duellmann.
Advertisements

Introduction to DBA.
CERN - IT Department CH-1211 Genève 23 Switzerland t LCG Persistency Framework CORAL, POOL, COOL – Status and Outlook A. Valassi, R. Basset,
1 Databases in ALICE L.Betev LCG Database Deployment and Persistency Workshop Geneva, October 17, 2005.
CASTOR Project Status CASTOR Project Status CERNIT-PDP/DM February 2000.
Grid and CDB Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
Batch Production and Monte Carlo + CDB work status Janusz Martyniak, Imperial College London MICE CM37 Analysis, Software and Reconstruction.
D. Düllmann - IT/DB LCG - POOL Project1 POOL Release Plan for 2003 Dirk Düllmann LCG Application Area Meeting, 5 th March 2003.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
12. March 2003Bernd Panzer-Steindel, CERN/IT1 LCG Fabric status
O. Stézowski IPN Lyon AGATA Week September 2003 Legnaro Data Analysis – Team #3 ROOT as a framework for AGATA.
Backup & Recovery Concepts for Oracle Database
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
2/10/2000 CHEP2000 Padova Italy The BaBar Online Databases George Zioulas SLAC For the BaBar Computing Group.
Database Technical Session By: Prof. Adarsh Patel.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
CERN - IT Department CH-1211 Genève 23 Switzerland t Tier0 database extensions and multi-core/64 bit studies Maria Girone, CERN IT-PSS LCG.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
+ discussion in Software WG: Monte Carlo production on the Grid + discussion in TDAQ WG: Dedicated server for online services + experts meeting (Thusday.
Horst Fischer University Freiburg / CERN COMPASS Computing.
Databases E. Leonardi, P. Valente. Conditions DB Conditions=Dynamic parameters non-event time-varying Conditions database (CondDB) General definition:
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
5 May 98 1 Jürgen Knobloch Computing Planning for ATLAS ATLAS Software Week 5 May 1998 Jürgen Knobloch Slides also on:
CHEP 2003 March 22-28, 2003 POOL Data Storage, Cache and Conversion Mechanism Motivation Data access Generic model Experience & Conclusions D.Düllmann,
Metadata Management of Terabyte Datasets from an IP Backbone Network: Experience and Challenges Sue B. Moon and Timothy Roscoe.
CERN Physics Database Services and Plans Maria Girone, CERN-IT
CERN/IT/DB A Strawman Model for using Oracle for LHC Physics Data Jamie Shiers, IT-DB, CERN.
CHEP 2000: 7-11 February, 2000 I. SfiligoiData Handling in KLOE 1 CHEP 2000 Data Handling in KLOE I.Sfiligoi INFN LNF, Frascati, Italy.
Installing, running, and maintaining large Linux Clusters at CERN Thorsten Kleinwort CERN-IT/FIO CHEP
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Lee Lueking 1 The Sequential Access Model for Run II Data Management and Delivery Lee Lueking, Frank Nagy, Heidi Schellman, Igor Terekhov, Julie Trumbo,
CASTOR evolution Presentation to HEPiX 2003, Vancouver 20/10/2003 Jean-Damien Durand, CERN-IT.
The POOL Persistency Framework POOL Project Review Introduction & Overview Dirk Düllmann, IT-DB & LCG-POOL LCG Application Area Internal Review October.
CERN IT Department CH-1211 Genève 23 Switzerland t Frédéric Hemmer IT Department Head - CERN 23 rd August 2010 Status of LHC Computing from.
CERN-IT Oracle Database Physics Services Maria Girone, IT-DB 13 December 2004.
CERN Database Services for the LHC Computing Grid Maria Girone, CERN.
STAR C OMPUTING Plans for Production Use of Grand Challenge Software in STAR Torre Wenaus BNL Grand Challenge Meeting LBNL 10/23/98.
Integration of the ATLAS Tag Database with Data Management and Analysis Components Caitriana Nicholson University of Glasgow 3 rd September 2007 CHEP,
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
23/2/2000Status of GAUDI 1 P. Mato / CERN Computing meeting, LHCb Week 23 February 2000.
Development of the CMS Databases and Interfaces for CMS Experiment: Current Status and Future Plans D.A Oleinik, A.Sh. Petrosyan, R.N.Semenov, I.A. Filozova,
D. Duellmann - IT/DB LCG - POOL Project1 The LCG Pool Project and ROOT I/O Dirk Duellmann What is Pool? Component Breakdown Status and Plans.
Oracle for Physics Services and Support Levels Maria Girone, IT-ADC 24 January 2005.
5 June 2003Alan Norton / Focus / EP Topics1 Other EP Topics Some 2003 Running Experiments - NA48/2 (Flavio Marchetto) - Compass (Benigno Gobbo) - NA60.
Andrea Valassi (CERN IT-DB)CHEP 2004 Poster Session (Thursday, 30 September 2004) 1 HARP DATA AND SOFTWARE MIGRATION FROM TO ORACLE Authors: A.Valassi,
ORACLE & VLDB Nilo Segura IT/DB - CERN. VLDB The real world is in the Tb range (British Telecom - 80Tb using Sun+Oracle) Data consolidated from different.
CERN/IT/DB DB US Visit Oracle Visit August 20 – [ plus related news ]
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CASTOR project status CASTOR project status CERNIT-PDP/DM October 1999.
CNAF Database Service Barbara Martelli CNAF-INFN Elisabetta Vilucchi CNAF-INFN Simone Dalla Fina INFN-Padua.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
ROOT Based CMS Framework Bill Tanenbaum US-CMS/Fermilab 14/October/2002.
Site Services and Policies Summary Dirk Düllmann, CERN IT More details at
CERN IT Department CH-1211 Genève 23 Switzerland t The Tape Service at CERN Vladimír Bahyl IT-FIO-TSI June 2009.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Online School Management System Supervisor Name: Ashraful Islam Juwel Lecturer of Asian University of Bangladesh Submitted By: Bikash Chandra SutrodhorID.
European Organization For Nuclear Research CERN Accelerator Logging Service Overview Focus on Data Extraction for Offline Analysis Ronny Billen & Chris.
Storage & Database Team Activity Report INFN CNAF,
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Jean-Philippe Baud, IT-GD, CERN November 2007
(on behalf of the POOL team)
CMS High Level Trigger Configuration Management
The COMPASS event store in 2002
Savannah to Jira Migration
POOL persistency framework for LHC
Dirk Düllmann CERN Openlab storage workshop 17th March 2003
Ákos Frohner EGEE'08 September 2008
Data Lifecycle Review and Outlook
ATLAS DC2 & Continuous production
Presentation transcript:

Objectivity Data Migration Marcin Nowak, CERN Database Group, CHEP 2003 March , La Jolla, California

CHEP 2003Marcin Nowak, CERN DB group2 Overview n Objectivity/DB at CERN: history and the current situation n Migration plans n Designing a new storage system based on RDBMS n Migrating the data: System setup Performance Results n The new storage system in production n Conclusions

CHEP 2003Marcin Nowak, CERN DB group3 Objectivity/DB at CERN n Objectivity/DB has been introduced at CERN in 1995 through RD45 Fully Object-Oriented database management system  Strong C++ and Java bindings,  Petabyte scalability It has been successfully used to store physics data by many experiments, including LHC and pre-LHC ones The main focus was always on application for LHC n Following the change of persistency baseline of LHC experiments, Objectivity/DB maintenance contract for CERN has been discontinued at the end of 2001 n Existing CERN Objectivity licenses are perpetual and allow using the software on the supported platforms indefinitely Objectivity v6.1.3: RedHat 6.x with g and Solaris 7-8 No maintenance means:  No new releases or bug fixes  No support for new versions of operating systems or compilers Increasingly difficult to support Objectivity applications n End of Objectivity support after Q agreed with experiments through FOCUS committee

CHEP 2003Marcin Nowak, CERN DB group4 Objectivity Databases in CERN Experiments n LHC experiments Event data  So far it is only test data – no need to preserve it for too long  Longer time scale permits to wait for a new, complete storage solution provided by LHC Computing Grid (LCG) projects –See POOL project presentations in ‘Persistency’ track on Monday Measurement data (detector construction data, etc.)  Migration to relational databases (JDBC, MySQL) n Pre-LHC experiments COMPASS  300TB of event data collected mainly in 2002  Running experiment - will collect more data in 2003 – 2004 Expecting 0.5-1PB in total HARP  30TB event data

CHEP 2003Marcin Nowak, CERN DB group5 Migration Project n Migration project scope: Migrate HARP and COMPASS physics data to a completely new persistency technology COMPASS migration has to be finished in advance before 2003 data-taking starts (May 2003)  Impossibility of changing the storage system during data taking creates a risk of additional year of Objectivity support Migrate data to new tape media  Available only at the end of 2002 n Migration tasks: Design and development of a new storage system Data migration Adaptation of the experiments’ software frameworks n Manpower: 2-3 FTEs Budget: none – use CERN/IT shared hardware resources

CHEP 2003Marcin Nowak, CERN DB group6 Designing The New System n The databases of both experiments share basic design principles: Raw events as BLOBs in original online format (DATE)  database files are stored in Castor (CERN HSM) Limited event metadata (RUN information, event headers) Conditions data is kept in ConditionsDB (CDB) COMPASS stores reconstructed events (DST) as persistent objects Physics analysis data is stored outside Objectivity n The proposed new data storage system: A hybrid solution based on a relational database and flat files  a la POOL Preserving essential features of the current system  navigational access to events and reconstructed events

CHEP 2003Marcin Nowak, CERN DB group7 New Data Storage - Details n Raw events  Original DATE format, flat files kept in Castor n Metadata  Mainly event headers and navigational information for raw and reconstructed events in a relational database  0.1% - 0.2% of the raw data volume  at CERN stored in Oracle (but without Oracle-specific features)  Possibility for another database in the outside institutes n Conditions data  Migrated to the new CDB implementation based on Oracle  No interface change (abstract interface) n DST  Similar to raw events, but using hard-coded object streaming

CHEP 2003Marcin Nowak, CERN DB group8 Database Schema RUN # run number o time o status o logbook Possible data relation Necessary data relation One to many relation Foreign key is a part of primary key for that table # Attribute is a part of primary key * Attribute cannot be null o Null value allowed for this attribute r Attribute is a foreign key u Attribute is a part of Unique constraint DST HEADER # event number * DST size * DST filepos * trigger mask o value1 o value2 o value3 RAW FILE # file ID u file name DST FILE # file ID u file name * DST version * DST type o value1 descr o value2 descr o value3 descr EVENT HDR # event number * event size * event filepos * burst number * event in burst * trigger mask * time * error code

CHEP 2003Marcin Nowak, CERN DB group9 Migration Data Flow Diagram LOG ORACLE Input disk pools 2x200GB Castor B Output disk pool Processing Node 10 MB/s overall data throughput per node Objectivity database files DATE files

CHEP 2003Marcin Nowak, CERN DB group10 Migration System Setup n Hardware setup: All nodes standard ‘CERN disk server PC’: 2x1GHz PIII systems, Gigabit Ethernet 11 processing nodes with 500GB disk 11 disk servers for the Castor (HSM) output pool  4x220GB + 2x450GB + 5x550GB = 4.5TB 3 Main Oracle databases (500GB disk each) Migration manager 1 migration database Castor manager (stager) 8 dedicated input tape drives 10 output tape drives n Data processing model: Reading entire tape as a single HSM operation Migration transaction granularity: 1 file (Objy database file)

CHEP 2003Marcin Nowak, CERN DB group11 Project Timeline n Summer 2002 – initial ideas about migration Designs, testing Oracle features  VLDB, R.A.C., partitions, Linux, C++ binding (OCCI) n Fall 2002 – implementation n November – testing and integration n Middle December – migration start n Christmas CERN closure (3 weeks) – running unattended n Middle January 2003 – achieved full planned speed of 100MB/s sustained n 20 th February 2003 – migration completed Ahead of schedule n Middle March 2003 – migrated data and databases system achieves production status

CHEP 2003Marcin Nowak, CERN DB group12 Migration Performance

CHEP 2003Marcin Nowak, CERN DB group13 COMPASS RAW Data Statistics n We have migrated 12 Objectivity federations: 290 TB files n Into 3 Oracle databases… 12 user accounts 335 GB total data tablespace size 6100 million rows  Expecting million rows after event reconstruction  50 bytes/row n … and into 220 TB raw data files in Castor 20% data reduction Close to 10% of the files were empty and therefore removed only 80 files failed migration (tape or Objy read problems) n We have copied 15% of all CERN data in Castor!

CHEP 2003Marcin Nowak, CERN DB group14 The New Storage System n The new COMPASS event database system consists of 3 Oracle9i servers running on commodity PC hardware with RedHat7.2 This configuration is expected to handle raw and reconstructed data from one year of data taking Still investigating Oracle R.A.C. setup with commodity shared disk storage  goal: minimize the number of maintained databases No Oracle-specific features were used  Data export to other sites should be easy n RAW data is stored in Castor Accessed directly from the application (not going through any central server) The number of Castor disk pools can be changed dynamically according to the needs Each event can be accessed immediately  Subject to file caching on the disk server

CHEP 2003Marcin Nowak, CERN DB group15 Ref. V. Duic/ M. Lamanna. “The COMPASS event store in 2002”. Parallel Session 8: Data Management and Persistency. Performance of the New System

CHEP 2003Marcin Nowak, CERN DB group16 Conclusions n The COMPASS Objectivity data migration has been finished successfully on schedule with minimal resource investment n The new event database implemented on RDBMS provides the required functionality: Navigational access Speed Scalability  Used for DST production with >350 concurrent processes n Very valuable exercise in handling large event databases for IT/DB group Proof of usability of Oracle database on commodity hardware with Linux