Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa.

Similar presentations


Presentation on theme: "Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa."— Presentation transcript:

1 Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa

2 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 2 Outline Ph.D. topic and deliverable. Basic concepts of replication and replica synchronization. CONStanza, a Replica Consistency Service for Data Grids. Oracle to MySQL synchronization. Functional (Cond. DB replication) and performance tests. Links and publications.

3 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 3 Ph.D. thesis at the University of Pisa “The Replica Consistency Problem in Data Grids”. Research started in 2003, Master thesis, Optorsim module for replica synchronization. Prototype implementation started in 2004, for file replication. ~2005 focus switched to database synchronization, especially Oracle to MySQL (VOMS used to be a use case). Current status: an advanced prototype has been tested to replicate Conditions databases from Oracle to MySQL using COOL as application to store and retrieve data. Basic file synchronization is also provided.

4 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 4 Replication and replica consistency Data Replication: well-known strategy to achieve fast data access fault tolerance load distribution What is needed: Replica Catalogues to store the mapping (lfn,pfn). Dynamic Replication services (when and where to create new replicas, which is the best replica to access). Replica Consistency Service (RCS). To keep replicas consistent when one of them is updated, Missing component in current Grid middleware implementations. Both applications data and Grid services metadata need the RCS.

5 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 5 Replica Synchronization mechanisms Synchronous (pessimistic, strict). Data are always synchronized. Not suitable for WANs with unreliable links. Asynchronous (optimistic, lazy), more flexible, less consistency guarantees. Asynchronous methods: Where can an update be done? Single vs. Multi master. What is transferred as an update? Content vs. log transfer. Who transfers an update? Push vs. pull.

6 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 6 The CONStanza project www.cern.ch/pucciani/constanza Team members: A.Domenici, F.Donno, G.Pucciani, H.Stockinger, Student Collaborations. A.DomeniciF.DonnoG.PuccianiH.Stockinger Student Collaborations An advanced prototype has been implemented. C++ with gSOAP web services. GSI security with CGSI. Globus toolkit 2.x for GridFTP file transfers. Flex/Bison for SQL dialect translations. For DBs (Oracle to MySQL), asynchronous single master, log based push model. A quorum can be set to deal with disconnected sites. Automatic re-synchronization of failed sites. Partial replication possible at table level. Multithreaded update propagation (file transfers) to reduce the impact of stale reads on the applications. Configuration files must be edited before starting the servers, then the synchronization is performed automatically

7 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 7 RCS Architecture Global Consistency Service (GRCS) as the main user interface Local Consistency Services run on Storage Elements

8 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 8 CONStanza: scenario for Oracle to Mysql synchronisation DB1 (Oracle) LRCS1 Extract Log LogDB1001 GRCS LRCS2LRCS3 Notify GRCS DB3 (MySQL) DB2 (MySQL) Update Replica LogDB1000-MySQL LogDB1001-MySQL DBUpdater GridFTP Apply Update DB1 (Oracle) DB2 (MySQL) DB3 (MySQL) RCS Update Master Log Watcher Advances in Computer Systems & Networks – Workshop (10 November 2006)

9 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 9 RCS security, modularity and fault tolerance capabilities Using the GSI communications (Client-RCS and GRCS-LRCSs) are secure, and only authenticated users and servers can play a role in the architecture. DBWatcher and DBUpdater have vendor specific implementations. Little effort is needed in order to support other DBs, both on the master (a log mining interface has to be present) and on the slave side. DBWatcher  GRCS and GRCS  LRCS communications can deal with temporarily disconnected sites. A version mechanism prevent out-of-order update propagation.

10 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 10 Testbed deployment Advances in Computer Systems & Networks – Workshop (10 November 2006) Master r/w Oracle DB at CNAF. Slave read-only replicas at: INFN Pisa SNS Pisa INFN Bari CERN Master DB with 4 Tables, with up to 8 columns, varchar and number types of different size. Slave replicas hold only the first 3 tables (partial replication).

11 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 11 Performance analysis for Oracle to MySQL synchronization Experimental design: response variables: update delay: y_autupdT = time last replica updated - time master updated, time needed to: extract updates from the master DB, create an update file, transfer the file, apply the updates to replicas, others... factors: A: number of secondary replicas, 4 levels (1, 2, 3, and 4 replicas, mainly bound by machine availability), B: size of update to the master database, 4 levels (1, 10, 100 and 1000 row per table), two-factor full factorial desing with 4 replication. 4 * 4 * 4 = 64 experiments. Advances in Computer Systems & Networks – Workshop (10 November 2006)

12 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 12 Performance results 1/2 y_autupdT: update delay = time last replica updated - time master updated Updates (inserts) of different size issued on the master Oracle database. Number of secondary replicas varied from 1 to 4. = 8.98sec Update size Advances in Computer Systems & Networks – Workshop (10 November 2006)

13 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 13 Performance results 2/2 Comparing the time spent by the different tasks that contribute to the update propagation phase, the most time consuming activity is the log extraction performed by the Oracle LogMiner.

14 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 14 T0-T1-T2 replication of conditions databases using Oracle Streams and CONStanza Insert data at the Tier-0 Oracle database and read these data at the Tier-2 MySQL replica verifying that the right number of objects with the right payload is present. The Tier-1 DB is a slave replica for Streams and a master for CONStanza. The Tier-1 DB must have a single instance (no RAC) for the LogMiner to properly work.

15 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 15 File synchronization Basic functionality for file synchronization are also provided, although applications requirements are not yet clear. Using GRCS CLI a user can subscribe files for which the RCS will manage the synchronization. A user can modify a single replica, and then ask the RCS to propagate the new version to all the replicas of the same lfn. rcs-subscribe-file rcs-update-file

16 25 March 2008, CERN IT-DM Technical Meeting, gianni.pucciani@cern.ch 16 Links and publications http://pucciani.web.cern.ch/pucciani/constanza/index.html Gianni Pucciani, Andrea Domenici, Flavia Donno, Heinz Stockinger. Consistency of replicated datasets in Grid Computing. Submitted for the upcoming Encyclopedia of Grid Computing Technologies and Applications. Gianni Pucciani, Andrea Domenici, Flavia Donno, Heinz Stockinger. A Performance Study on the Synchronisation of Heterogeneous Grid Databases Using CONStanza. Submitted to Future Generation Computer Systems. Andrea Domenici, Flavia Donno, Gianni Pucciani, Heinz Stockinger. Relaxed Data Consistency with CONStanza. Sixth IEEE International Symposium on Cluster Computing and the Grid (CCGrid06), Singapore 16-19 May 2006, IEEE Computer Society.Relaxed Data Consistency with CONStanza Andrea Domenici, Flavia Donno, Gianni Pucciani, Heinz Stockinger. CONStanza: Data Replication with Relaxed Consistency. Technical Report INFN / TC_06 / 6, March 2006. CONStanza: Data Replication with Relaxed Consistency Andrea Domenici, Flavia Donno, Gianni Pucciani, Heinz Stockinger, Kurt Stockinger. Replica consistency in a Data Grid. Proceedings of the IX International Workshop on Advanced Computing and Analysis Techniques in Physics Research, Tsukuba, Japan, December 1-5, 2003.Replica consistency in a Data Grid


Download ppt "Oracle to MySQL synchronization Gianni Pucciani CERN, University of Pisa."

Similar presentations


Ads by Google