Presentation is loading. Please wait.

Presentation is loading. Please wait.

Emulsion Database Design Status Report Cristiano Bozza European Emulsion Group LNGS, May 2003 Updated DB Schema Distributed DB Implementation DB Client.

Similar presentations


Presentation on theme: "Emulsion Database Design Status Report Cristiano Bozza European Emulsion Group LNGS, May 2003 Updated DB Schema Distributed DB Implementation DB Client."— Presentation transcript:

1 Emulsion Database Design Status Report Cristiano Bozza European Emulsion Group LNGS, May 2003 Updated DB Schema Distributed DB Implementation DB Client Technologies DB Client Libraries Conclusions Extends Opera Internal Note # 38: “Database Architecture for the European Emulsion Scanning System”

2 Updated DB Schema

3

4

5 Overall structure unaltered, only fine tuning of data types and table columns To assess performance, tuning must be done in real-life conditions Not only benchmarks, but real use of DB for test exposure data

6 Updated DB Schema DB size estimate Assumptions for minimum size: Few data from CS (< 100 tracks per sheet), so they are negligible Intercalibration through 4 track maps per sheet, 100 tracks / map All sheets in a brick are intercalibrated (conservative) 1 candidate is followed on 28 sheets on average (conservative) 1 “interesting points” per event (vtx, decays, e.m. showers,...) (depends on Physics as well as instrumental needs) 25000 events (OPERA half data set + tests, technical runs...) 3 pass scanning (Vtx location, selection, precision measurement) 100 cosmics / background / fake microtracks in each zone

7 Updated DB Schema DB size estimate Assumptions for maximum size: Few data from CS (< 100 tracks per sheet), so they are negligible Intercalibration through 4 track maps per sheet, 100 tracks / map All sheets in a brick are intercalibrated (conservative) 10 candidates are followed on 28 sheets on average (conservative) 3 “interesting points” per event (vtx, decays, e.m. showers,...) (depends on Physics as well as instrumental needs) 50000 events (OPERA full data set + tests, technical runs...) 3 pass scanning (Vtx location, selection, precision measurement) 100 cosmics / background / fake microtracks in each zone

8 Updated DB Schema DB size estimate Results: (depends on Physics as well as instrumental needs) Assuming 100 bytes / track + aligned reconstructions, we have 0.11÷2.3 TB Including safety factors for possible underestimations, the DB size should stay within DBSize < 2.5 TB To estimate the network bandwidth needed, we assume that the full dataset is read 10 times during 5 years’ data taking 0.055÷1.2 Mbit/s ON AVERAGE

9 Distributed DB Implementation Core DB Workstation DB Group Scanning DB

10 Distributed DB Implementation Core DB Group: Centralized location Full OPERA emulsion data set Multimaster replication (each machine has one copy) Scanning DB: One machine per each scanning site (+ optional backup) Copy of the subset of locally produced scanning data Materialized views (each machine stores only locally produced data) The full dataset is still accessible in a transparent way Minimum network traffic

11 Distributed DB Implementation Core DB Group Simulation: Two Dell servers with high speed (1 Gbit) Ethernet connection Scanning DB Simulation: One Dell server machine with a materialized view of Salerno data Normal 100 Mbit/s LAN connection Pilot DB farm

12 DB Client Technologies Several client connection technologies are being explored Goal: highest possible data availability for Oracle data Windows: ODBC, OLE DB, Oracle ODP.NET, ADO, ADO.NET Linux: Oracle OCI C++ libraries, Perl, Tcl/Tk, Python, GNOME-DB, J2EE Mono (.NET) Oracle provider

13 DB Client Technologies Oracle and Windows: Working with DB is common practice in the Windows community. Oracle provides both the DB server and several client tools / libraries. Oracle 9i Database Server consists of 3 installation disks (2.4 GB). We have made several performance tests to choose the best access method. Since our code will be under.NET, the ADO.NET layer was common to all tests. ODBC: general DB library. OLE DB: 50% faster than ODBC. Oracle Data Provider (ODP.NET): slightly slower (~10%?) than OLE DB. OCI C++: low level API, much harder than other methods. OK!

14 DB Client Technologies Oracle and Linux: Oracle is fully committed to supporting the Linux operating system. Indeed, Oracle was the first commercial database available on Linux. All key Oracle products including Oracle 9i Database, Application Server, Collaboration Suite, Developer Suite and E-Business Suite. Oracle 9i Database consists of 3 cpio archive files ( ~ 1.4 Gb). We installed it on a Red Hat 7.3 Linux Distribution.

15 DB Client Technologies Oracle and Linux: OCI C++ Oracle Call Interface (OCI) is the Oracle software allowing access to the database from an external application. In principle, you can use this C-based API to build a Database application from scratch. Oracle and Linux: Perl Perl is probably the most famous open source language. It is an interpreted scripting language easy to learn and extremely quick. It is widely used in Internet and Database applications.

16 DB Client Technologies Oracle and Linux: Perl Oracle DB Oracle OCI Perl 5 Script Perl DBI DBD:: Oracle Perl Database applications are based on the DBI module, an object oriented architecture module. This module requires a specific database dependent driver (DBD::Oracle) to connect to the database

17 DB Client Technologies Oracle and Linux: Perl A sample Perl script to access data from Oracle use strict ; use DBI ; # Connection to the DB my $dbh = DBI -> connect(dbi:Oracle:operadb,operausr,operapwd); # Database SQL Query my $sql = qq { SELECT Grains,SlopeX,SlopeY FROM MIPBaseTracks}; # Execute SQL Query my $sth = $dbh ->prepare ($sql); $sth -> execute (); # Loop on the table While (my ($run,$event,$ntracks)= $sth->fetchrow_array){ # Here you can do whatever you want with the data }

18 DB Client Technologies Oracle and Linux: Tcl/Tk Tcl is an excellent scripting language created in 1987 by John Ousterhout. In 1988 he started to develop a graphic tool called Tk. From that moment on, Tcl/Tk is one of the most favourite language in the Open Source community. Oratcl is the module allowing the connection to an Oracle database. The connection to the database can be tested interactively using the Tcl shell.

19 DB Client Technologies Oracle and Linux: Tcl/Tk A sample Tcl/Tk script to access data from Oracle tclsh %package require Oratcl %set handle [oralogon operausr/operapwd@operadb] %set cursor [oraopen $handle] %orasql $cursor {select Grains,SlopeX,SlopeY from MIPBaseTracks} %orafetch $cursor 25 0.400 0.003 32 0.231 0.005 …………… %oraclose $cursor %oralogoff $handle %exit

20 DB Client Technologies Oracle and Linux: Perl/Tk Tk is wrapped inside Perl. So you can combine Perl quickness with Tk graphics capabilities. Oraexplain is the Perl/Tk module used to connect a program to an Oracle database. Oraexplain scripts are more complex because they involve graphical elements like windows, buttons and frames. You can use them to develop login windows or more complex graphics applications.

21 DB Client Technologies Oracle and Linux: Python Python is a GUI open source and object-oriented scripting language created by Guido van Rossum. It is used for any kind of application: GUI, XML, e- mail, …. DCOracle is the module used to connect a Python script to an Oracle database. With Python you can define classes and re-use them in other scripts.

22 DB Client Technologies Oracle and Linux: GNOME-DB The GNOME-DB project aims to provide a free unified data access architecture to the GNOME project. GNOME-DB is useful for any application that accesses persistent data (not only databases), since it now contains a pretty good data management API. We are currently testing the GNOME-DB libraries. Oracle and Windows/Linux: Mono ADO.NET Data Provider for Oracle databases works on Windows and Linux with Oracle 8i and higher versions. We are currently testing the Mono ADO.NET libraries. Tests in progress: Web applications with Apache, JSP, PHP, Java, JDBC

23 DB Client Libraries We are developing a full set of.NET / Mono classes specific to Opera DB OperaDb OperaDb::Connection OperaDb::Transaction OperaDb::Scanning OperaDb::Scanning::Batch OperaDb::Scanning::LinkedZone OperaDb::Scanning::MIPBaseTrack OperaDb::Scanning::MIPIndexedEmulsionTrack OperaDb::TotalScan OperaDb::TotalScan::Layer OperaDb::TotalScan::Segment OperaDb::TotalScan::Track OperaDb::TotalScan::Volume OperaDb::ComputingInfrastructure OperaDb::ComputingInfrastructure::Machine OperaDb::ComputingInfrastructure::ProgramSettings OperaDb::ComputingInfrastructure::User OperaDb::ComputingInfrastructure::UserPermissions OperaDb::ComputingInfrastructure::Site Namespaces in black Classes in dark red Who does not want to use SQL can program the DB in C++, C#, VB, FORTRAN, and so on, using these classes More classes to come...

24 DB Client Libraries Code samples Sample #1: How to store a scanning zone into Opera DB C++: idZone = LinkedZone::Save(pMyZone, idBatch, rawDataPath, startTime, endTime, dbConn, dbTrans); C#: idZone = LinkedZone.Save(MyZone, idBatch, rawDataPath, startTime, endTime, dbConn, dbTrans); Sample #2: How to retrieve a volume reconstruction from Opera DB C++: Volume *pVol = new Volume(dbConn, dbTrans, idVolume, true); C#: Volume Vol = new Volume(dbConn, dbTrans, idVolume, true); Sample #3: How to convert a scanning zone from Opera DB to ROOT file C++: SySal:: Root::Opera::LinkedZone::Save(new LinkedZone(dbConn, dbTrans, idZone), filePath); C#: SySal.Root.Opera.LinkedZone.Save(new LinkedZone(dbConn, dbTrans, idZone), filePath);

25 Conclusions The DB architecture that has been proposed some months ago is working Oracle 9iDS looks a very good choice (easy to implement, maintain, and develop; widely supported) Even people that are not familiar with SQL can easily work with the proposed structure of emulsion DB using interface libraries All interesting OS are supported by Oracle and by our interface libraries Conversion to Root data is trivial (1 line of code)...everything fine up to now! Interface libraries for OperaDB are almost (95%) complete


Download ppt "Emulsion Database Design Status Report Cristiano Bozza European Emulsion Group LNGS, May 2003 Updated DB Schema Distributed DB Implementation DB Client."

Similar presentations


Ads by Google